Submitted by Linear-- t3_11bcklh in MachineLearning

For the model, basically, both SLL and SL requires it to learn a mapping from X(input) to Y(label), (or a probability distribution of the label). And usually, the optimization processes for both are basically the same, at least for deep learning.

What's specific to SSL is just that, it's already labelled so no extra labelling is required. This facilitates pre-training from a much larger dataset since hand-labelling is expensive.

0

Comments

You must log in or register to comment.

cthorrez t1_j9x8x1b wrote

The methods and models are identical yep. It's basically just to denote that whether the labels were assigned by a human or determined automatically.

31

visarga t1_j9y79v0 wrote

But the text coming from a human should be considered "manually" labelled, right?

1

mil24havoc t1_j9x8tol wrote

I generally agree with you. But it is useful to have a term for training methods that use clever tricks to bypass manual data labeling, usually with some secondary objective in mind (that the model should do something that is not strictly the same as the SSL objective). In that sense, I think of it as a subset of supervised learning. In ML, literally every innovation gets its own catchy name. This is in contrast to, say, statistics, where major innovations often aren't named until years later. I suspect this has to do with the hotness and competitiveness of ML - you need a catchy name to stand out in a crowd of thousands of papers doing very similar things.

15

currentscurrents t1_j9xg9kn wrote

You're looking at the wrong level. SSL is a different training objective. Everything else about the model and optimizer is the same, but you're training it on a different problem.

Also SSL has other advantages beyond being cheaper. SL can only teach you ideas humans already know, while SSL learns from the data directly. It would be fundamentally impossible to create labels for every single concept a large model like GPT-3 knows.

Yann Lecun is almost certainly right that most human learning is SSL. Very little of our input data is labeled - and for animals, possibly none.

9

Linear-- OP t1_j9xqtsx wrote

It's clear that human and other animals must learn with reinforcement -- requiring the agent to act and recevive feedback/reward. This is an important part and I don't think it's proper to classify it as SSL. Moreover, psychology on learning points out that problem-solving and immediate feedback is very important for learning outcomes -- these feedbacks are typically human labels or reward signal.

1

currentscurrents t1_j9yxr37 wrote

Look up predictive coding; neuroscientists came up with it in the 80s and 90s.

A good portion of learning works by trying to predict the future and updating your brain's internal model when you're wrong. This is especially involved in perception and world modeling tasks, like vision processing or commonsense physics.

You would have a very hard time learning this from RL. Rewards are sparse in the real world, and if you observe something that doesn't affect your reward function, RL can't learn from it. But predictive coding/self-supervised learning can learn from every bit of data you observe.

You do also use RL, because there are some things you can only learn through RL. But this becomes much easier once you already have a rich mental model of the world. Getting good at predicting the future makes you very good at predicting what will maximize your reward.

6

AmalgamDragon t1_j9zyyib wrote

> Rewards are sparse in the real world

This doesn't seem true. The only reason we aren't getting negative rewards (e.g. pain, discomfort, etc.) constantly is that we learn to generally avoid them.

2

currentscurrents t1_ja5isuz wrote

Imagine you need to cook some food. None of the steps of cooking give you any reward, you only get the reward at the end.

Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards. Self-supervised learning helps with this by building a world model that you can use to predict future rewards.

1

AmalgamDragon t1_ja5lz5b wrote

This really comes down to how 'reward' is defined. I think we likely disagree on that definition, with yours being a lot narrower then mine is. For example, during the cooking process, there is usually a point before the meal is done where it 'smells good', which is a reward. There's dopamine release as well, which could be triggered when completing some of the steps (don't know if that's the case or not), but simply observing that a step is complete is rewarding for lots of folks.

> Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards.

Depends on which algorithms you're using, but PPO can handle this quite well.

1

visarga t1_j9y7fro wrote

Words in language are both observations and actions. So language modelling is also a kind of supervised policy learning?

So... Self Supervised Learning is Unsupervised & Supervised & Reinforcement Learning.

3

KingsmanVince t1_j9xk6rf wrote

>Isn't self-supervised learning(SSL) simply a kind of SL?

Don't their names already tell that? Self-supervised learning... supervised learning...

>So I think classifying them as disjoint is somewhat misleading.

Who said this?

The ways of determining labels of both paradigms are different (as u/cthorrez said). Moreover, the objectives are different (as u/currentscurrents said).

7

Siltala t1_j9xwvyh wrote

Why does it not stand for Sexual Learning? I see a business opportunity…

3

Linear-- OP t1_j9xpz00 wrote

So you want to argue that the name of the post is trivally true so not worth mentioning, and problematic(as your last paragraph suggest)? Not so constructive.

−8

KingsmanVince t1_j9xr8oe wrote

>Not so constructive.

It's not much I am aware. However, what I mean that names of both training paradigm already told you a part of the answer. The last paragraph of mine is to refer two other comments to create a more sufficient answer.

Moreover, the names of both already pointed it's somewhat related. Therefore, this line

>So I think classifying them as disjoint is somewhat misleading.

is obvious. I don't know who have said "classifying them as disjoint" to you. Clearly they didn't pay attention to the names.

4

Linear-- OP t1_j9xu7pn wrote

You can not just confidently infer meaning from the name. Is "Light Year" a unit of time?

By your logic, "unsupervised learning" is not supervised learning, while SSL is sometimes classified as part of unsupervised learning, so now SSL isn't SL as well!

So "I think classifying them as disjoint is somewhat misleading."

is obvious.

My fault, deleted. Satisfied now?

−2

paradigmai t1_j9xyrcy wrote

IMO, although the optimization techniques are the same, it is important to make this distinction because SSL does not require curated labels. And in some use cases SSL is not an option at all.

1

Linear-- OP t1_j9xt0nh wrote

I've now done some further research and read the comments.

By far, my conclusion is that, SSL is indeed, a type of SL. It contains features and corresponding label(s). From wikipedia:

>Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labeled examples, meaning that each data point contains features (covariates) and an associated label.

Since this is not a debate, I do not want to dwell on the definition. And indeed, *self-*supervised means that it does not require extra resource-consuming labelling from human, making training with huge datasets possible, like GPT-3.

And I disagree that seeing SSL as a kind of SL is the "wrong level" as a comment suggestted. What I originally intended to confirm was that, language modeling, which gives rise to GPT-3/ChatGPT... Is a kind of supervised learning with a large quantity (and sometimes good quality) of data. Strong model with simple, old methods.

−1