[deleted] t1_j9x7ewp wrote on February 25, 2023 at 5:13 AM

#1,986,650

[removed]

mil24havoc t1_j9x8tol wrote on February 25, 2023 at 5:27 AM

#1,986,890

I generally agree with you. But it is useful to have a term for training methods that use clever tricks to bypass manual data labeling, usually with some secondary objective in mind (that the model should do something that is not strictly the same as the SSL objective). In that sense, I think of it as a subset of supervised learning. In ML, literally every innovation gets its own catchy name. This is in contrast to, say, statistics, where major innovations often aren't named until years later. I suspect this has to do with the hotness and competitiveness of ML - you need a catchy name to stand out in a crowd of thousands of papers doing very similar things.

cthorrez t1_j9x8x1b wrote on February 25, 2023 at 5:28 AM

#1,986,904

The methods and models are identical yep. It's basically just to denote that whether the labels were assigned by a human or determined automatically.

[deleted] t1_j9xb4b2 wrote on February 25, 2023 at 5:51 AM

#1,987,261

Replying to mil24havoc (#1,986,890)

[deleted]

[deleted] t1_j9xbtae wrote on February 25, 2023 at 5:59 AM

#1,987,376

[removed]

currentscurrents t1_j9xg9kn wrote on February 25, 2023 at 6:48 AM

#1,988,102

You're looking at the wrong level. SSL is a different training objective. Everything else about the model and optimizer is the same, but you're training it on a different problem.

Also SSL has other advantages beyond being cheaper. SL can only teach you ideas humans already know, while SSL learns from the data directly. It would be fundamentally impossible to create labels for every single concept a large model like GPT-3 knows.

Yann Lecun is almost certainly right that most human learning is SSL. Very little of our input data is labeled - and for animals, possibly none.

KingsmanVince t1_j9xk6rf wrote on February 25, 2023 at 7:37 AM

#1,988,672

>Isn't self-supervised learning(SSL) simply a kind of SL?

Don't their names already tell that? Self-supervised learning... supervised learning...

>So I think classifying them as disjoint is somewhat misleading.

Who said this?

The ways of determining labels of both paradigms are different (as u/cthorrez said). Moreover, the objectives are different (as u/currentscurrents said).

Linear-- OP t1_j9xpz00 wrote on February 25, 2023 at 8:55 AM

#1,989,507

Replying to KingsmanVince (#1,988,672)

So you want to argue that the name of the post is trivally true so not worth mentioning, and problematic(as your last paragraph suggest)? Not so constructive.

Linear-- OP t1_j9xqtsx wrote on February 25, 2023 at 9:07 AM

#1,989,629

Replying to currentscurrents (#1,988,102)

It's clear that human and other animals must learn with reinforcement -- requiring the agent to act and recevive feedback/reward. This is an important part and I don't think it's proper to classify it as SSL. Moreover, psychology on learning points out that problem-solving and immediate feedback is very important for learning outcomes -- these feedbacks are typically human labels or reward signal.

KingsmanVince t1_j9xr8oe wrote on February 25, 2023 at 9:13 AM

#1,989,700

Replying to Linear-- (#1,989,507)

>Not so constructive.

It's not much I am aware. However, what I mean that names of both training paradigm already told you a part of the answer. The last paragraph of mine is to refer two other comments to create a more sufficient answer.

Moreover, the names of both already pointed it's somewhat related. Therefore, this line

>So I think classifying them as disjoint is somewhat misleading.

is obvious. I don't know who have said "classifying them as disjoint" to you. Clearly they didn't pay attention to the names.

Linear-- OP t1_j9xt0nh wrote on February 25, 2023 at 9:39 AM

#1,989,934

I've now done some further research and read the comments.

By far, my conclusion is that, SSL is indeed, a type of SL. It contains features and corresponding label(s). From wikipedia:

>Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labeled examples, meaning that each data point contains features (covariates) and an associated label.

Since this is not a debate, I do not want to dwell on the definition. And indeed, *self-*supervised means that it does not require extra resource-consuming labelling from human, making training with huge datasets possible, like GPT-3.

And I disagree that seeing SSL as a kind of SL is the "wrong level" as a comment suggestted. What I originally intended to confirm was that, language modeling, which gives rise to GPT-3/ChatGPT... Is a kind of supervised learning with a large quantity (and sometimes good quality) of data. Strong model with simple, old methods.

Linear-- OP t1_j9xu7pn wrote on February 25, 2023 at 9:56 AM

#1,990,109

Replying to KingsmanVince (#1,989,700)

You can not just confidently infer meaning from the name. Is "Light Year" a unit of time?

By your logic, "unsupervised learning" is not supervised learning, while SSL is sometimes classified as part of unsupervised learning, so now SSL isn't SL as well!

So "I think classifying them as disjoint is somewhat misleading."

is obvious.

My fault, deleted. Satisfied now?

Siltala t1_j9xwvyh wrote on February 25, 2023 at 10:35 AM

#1,990,501

Replying to KingsmanVince (#1,988,672)

Why does it not stand for Sexual Learning? I see a business opportunity…

paradigmai t1_j9xyrcy wrote on February 25, 2023 at 11:02 AM

#1,990,755

IMO, although the optimization techniques are the same, it is important to make this distinction because SSL does not require curated labels. And in some use cases SSL is not an option at all.

visarga t1_j9y79v0 wrote on February 25, 2023 at 12:49 PM

#1,992,138

Replying to cthorrez (#1,986,904)

But the text coming from a human should be considered "manually" labelled, right?

visarga t1_j9y7fro wrote on February 25, 2023 at 12:51 PM

#1,992,169

Replying to Linear-- (#1,989,629)

Words in language are both observations and actions. So language modelling is also a kind of supervised policy learning?

So... Self Supervised Learning is Unsupervised & Supervised & Reinforcement Learning.

[deleted] t1_j9y7uzf wrote on February 25, 2023 at 12:56 PM

#1,992,260

[deleted]

currentscurrents t1_j9yxr37 wrote on February 25, 2023 at 4:24 PM

#1,997,399

Replying to Linear-- (#1,989,629)

Look up predictive coding; neuroscientists came up with it in the 80s and 90s.

A good portion of learning works by trying to predict the future and updating your brain's internal model when you're wrong. This is especially involved in perception and world modeling tasks, like vision processing or commonsense physics.

You would have a very hard time learning this from RL. Rewards are sparse in the real world, and if you observe something that doesn't affect your reward function, RL can't learn from it. But predictive coding/self-supervised learning can learn from every bit of data you observe.

You do also use RL, because there are some things you can only learn through RL. But this becomes much easier once you already have a rich mental model of the world. Getting good at predicting the future makes you very good at predicting what will maximize your reward.

AmalgamDragon t1_j9zyyib wrote on February 25, 2023 at 8:30 PM

#2,003,825

Replying to currentscurrents (#1,997,399)

> Rewards are sparse in the real world

This doesn't seem true. The only reason we aren't getting negative rewards (e.g. pain, discomfort, etc.) constantly is that we learn to generally avoid them.

currentscurrents t1_ja5isuz wrote on February 27, 2023 at 12:11 AM

#2,039,772

Replying to AmalgamDragon (#2,003,825)

Imagine you need to cook some food. None of the steps of cooking give you any reward, you only get the reward at the end.

Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards. Self-supervised learning helps with this by building a world model that you can use to predict future rewards.

AmalgamDragon t1_ja5lz5b wrote on February 27, 2023 at 12:35 AM

#2,040,376

Replying to currentscurrents (#2,039,772)

This really comes down to how 'reward' is defined. I think we likely disagree on that definition, with yours being a lot narrower then mine is. For example, during the cooking process, there is usually a point before the meal is done where it 'smells good', which is a reward. There's dopamine release as well, which could be triggered when completing some of the steps (don't know if that's the case or not), but simply observing that a step is complete is rewarding for lots of folks.

> Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards.

Depends on which algorithms you're using, but PPO can handle this quite well.

currentscurrents t1_ja5n5xi wrote on February 27, 2023 at 12:44 AM

#2,040,588

Replying to AmalgamDragon (#2,040,376)

Those are all internal rewards, which your brain creates because it knows (according to the world model) that these events lead to real rewards. It can only do this because it has learned to predict the future.

>PPO can handle this quite well.

"Quite well" is still trying random actions millions of times. World modeling allows you to learn from two orders of magnitude less data.

[D] Isn't self-supervised learning(SSL) simply a kind of SL?

Comments