Viewing a single comment thread. View all comments

Cryptizard t1_j3998va wrote

Who exactly are you crediting with inventing this guess the next word approach?

2

Scarlet_pot2 OP t1_j39cjh8 wrote

it was a small group of engineers at google. Not highly funded. They were trying to make something for google translate when they figured out they can make a program that guesses the next word.

1

visarga t1_j39xs2x wrote

No, this concept is older, it predates Google. Hinton was working on it in 1986 and Schmidhuber in 1990s. By the way, "next token prediction" is not necessarily state of the art. The UL2 paper showed it is better to use a mix of masked spans.

If you follow the new papers, there are a thousand ideas floating around. How to make models learn better, how to make them smaller, how to teach the network to compose separate skills, why training on code improves reasoning skills, how to generate problem solutions as training data... we just don't know which are going to matter down the line. It takes a lot of time to try them out.

Here's a weird new idea: StitchNet: Composing Neural Networks from Pre-Trained Fragments. (link) People try anything and everything.

Or this one: Massive Language Models Can Be Accurately Pruned in One-Shot. (link) - maybe it means we will be able to run GPT-3 size models on a gaming desktop instead of a $150,000 computer

2

Cryptizard t1_j39dcvq wrote

I can’t find any evidence of this happening.

1

Scarlet_pot2 OP t1_j39g574 wrote

https://en.wikipedia.org/wiki/Word2vec

"Word2vec is a technique for natural language processing (NLP) published in 2013 (Google). The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text."

This was the first "guess the next word" model.

https://towardsdatascience.com/attention-is-all-you-need-discovering-the-transformer-paper-73e5ff5e0634

This next link is the "Attention is all you need" paper that describes how to build a transformer model for the first time.

These two discoveries didn't take millions or billions in funding. Made by small groups of passionate people, and their work led to the LLMs of today. We need to find new methods that would be similarly disruptive when extrapolated out.. and the more people we have working on it, the better chance we have of finding things like these. IMO these are parts of the future AGI, or at least important steps towards it. It doesn't take ungodly amounts to make the important innovations like these

1

Cryptizard t1_j39gpo3 wrote

They all have PhDs in AI though…

2

Scarlet_pot2 OP t1_j39hw2h wrote

Lets say there's a group of passionate PhDs self funded, over time they have a chance of 20% of finding a innovation or discovery in AI.

now let's say there is another group of intermediate and beginners, self funded, over time they have a 2% chance of making a discovery in AI.

But for the second example, there is 10 of those teams. All the teams mentioned are trying different things. If the end goal is advancement towards AGI, they all should be encouraged to keep trying and sharing right?

1

Cryptizard t1_j39jqjy wrote

I am claiming, though, that amateurs and enthusiasts are incapable of contributing to state-of-the-art AI. There is too much accumulated knowledge. If it was a low, but possible, chance to just make AGI from first principles it would have already happened sometime in the last 50 years that people were working on it. If, however, it is like every other field of science, you need to build the next thing with at least deep understanding of the previous thing.

Your examples might not have had a lot of money, but they all certainly were experts in AI and knew what they were doing.

2