mycall t1_j51xq1r wrote on January 19, 2023 at 8:48 PM

Reply to comment by EmmyNoetherRing in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

Loss/cost functions are used to optimize the model during training. The objective is almost always to minimize the loss function. The lower the loss the better the model. Cross-Entropy loss is a most important cost function. It is used to optimize classification models. The understanding of Cross-Entropy is pegged on understanding of Softmax activation function.

EmmyNoetherRing t1_j522inn wrote on January 19, 2023 at 9:16 PM

So I'm in a different flavor of data science, which means I've got the basic terminology, but not the specifics. I know what a loss function is and what entropy is. What role does "cross" play here? A cross between what?

EmmyNoetherRing t1_j5253a8 wrote on January 19, 2023 at 9:31 PM

>Softmax activation function

Ok, got it. huh (on reviewing wikipedia). so to rephrase the quoted paragraph, they find that the divergence between the training and testing distribution (between the compressed versions of the training and testing data sets in my analogy) starts decreasing smoothly as the scale of the model increases, long before the actual final task performance locks into place successfully.

Hm. Says something more about task complexity (maybe in some computability sense, a fundamental task complexity, that we don't have well defined for those types of tasks yet?). Rather than imagination I think, but I'm still with you on imagination being a factor, and of course the paper and the blog post both leave the cliff problem unsolved. Possibly there's a definition of imagination such that we can say degree X of it is needed to successfully complete those tasks.