visarga t1_j39xs2x wrote on January 7, 2023 at 1:19 AM

Reply to comment by Scarlet_pot2 in We need more small groups and individuals trying to build AGI by Scarlet_pot2

No, this concept is older, it predates Google. Hinton was working on it in 1986 and Schmidhuber in 1990s. By the way, "next token prediction" is not necessarily state of the art. The UL2 paper showed it is better to use a mix of masked spans.

If you follow the new papers, there are a thousand ideas floating around. How to make models learn better, how to make them smaller, how to teach the network to compose separate skills, why training on code improves reasoning skills, how to generate problem solutions as training data... we just don't know which are going to matter down the line. It takes a lot of time to try them out.

Here's a weird new idea: StitchNet: Composing Neural Networks from Pre-Trained Fragments. (link) People try anything and everything.

Or this one: Massive Language Models Can Be Accurately Pruned in One-Shot. (link) - maybe it means we will be able to run GPT-3 size models on a gaming desktop instead of a $150,000 computer