Viewing a single comment thread. View all comments

Thatweasel t1_j1zsodd wrote

Yeah with current approaches you need *someone* to categorise and tag whatever training data you're using at some point. Unfortunately most bits of data don't come with convenient pre-categorisations

49

currentscurrents t1_j21fh9o wrote

The big thing these days is "self-supervised" learning.

You do the bulk of the training on a simpler task, like predicting missing parts of images or sentences. You don't need labels for this, and it allows the model to learn a lot about the structure of the data. Then you fine-tune the model with a small amount of labeled data for the specific task you want it to do.

Not only does this require far less labeled data, it also lets you reuse the model - you don't have to repeat the first phase of training, just the fine-tuning. You can download pretrained models on huggingface and adapt them to your specific task.

15

-_1_2_3_- t1_j226dlz wrote

>There are a few misconceptions in the sentiment you mentioned. First, it is true that most data does not come with convenient pre-categorizations. However, this does not necessarily mean that someone needs to manually categorize and tag the data for it to be used in unsupervised learning.
>
>Unsupervised learning is a type of machine learning where the model is not given any labeled data or supervision. Instead, the model must learn to identify patterns and relationships in the data on its own. This is in contrast to supervised learning, where the model is given labeled data and is trained to predict a specific outcome based on this data.
>
>In unsupervised learning, the model does not need to be told what categories or tags to look for in the data. Instead, it can use techniques like clustering to group similar data points together and identify patterns in the data. This allows the model to discover and learn about the underlying structure of the data without the need for explicit categorization or tagging.
>
>Therefore, while it is often helpful to have some level of human annotation or labeling of data, it is not always necessary in unsupervised learning. The model can still learn and make useful predictions or discoveries even if the data is not explicitly labeled or categorized.

Written for you by an AI powered by unsupervised learning...

5