StrippedSilicon
StrippedSilicon t1_jdt7h5o wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
So... how does it solve a complicated math problem it hasn't seen before exactly with only regurgitating information?
StrippedSilicon t1_jdrldvz wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Recontextualize information is not unfair, but I'm not sure that it really explains things like the example in 4.4 where it answers a math Olympiad question that there's no way was in the training set (assuming that they're being honest about the training set). I don't know how a model can arrive at the answer it does without some kind of deeper understanding than just putting existing information together in a different order. Maybe the most correct thing is simply to admit we don't really know what's going on since a 100 billion parameters, or however big gpt-4 is, is beyond a simple interpretation.
"Open"AI's recent turn to secrecy isn't helping things either.
StrippedSilicon t1_jdnukc7 wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
People who point to this paper to claim sentience or AGI or whatever are obviously wrong, it's nothing of the sort. Still, saying that it's just memorizing is also very silly, given it can answer questions that aren't in the training data, or even particularly close to anything in the training data.
StrippedSilicon t1_jdmhdac wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
You are wrong, it does well on problems completely outside of it's training data. There's a good look here: https://arxiv.org/abs/2303.12712
It's obviously not just memorizing, it has some kind of "understanding" to be able to do this.
StrippedSilicon t1_jdte8lj wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
That's why I'm appealing to "we don't actually understand what it's doing" case. Certainly the AGI-like intelligence explanation falls apart in alot of cases, but the explanation of only spitting out the training data in a different order or context doesn't work either.