Vegetable-Skill-9700 OP t1_jdl680d wrote on March 25, 2023 at 5:00 AM

That's an interesting analogy!

Short_Change t1_jdlp0cw wrote on March 25, 2023 at 9:20 AM

Actually if his analogy is true, we will have 20 trillion parameters in the future for modern consumption.

Crystal-Ammunition t1_jdlpzsw wrote on March 25, 2023 at 9:34 AM

At that point, the training data world have to almost completely be synthetic, right?

EmmyNoetherRing t1_jdma3em wrote on March 25, 2023 at 1:27 PM

Introspection? Cog-sci/classical AI like to use the term, not always in the best justified fashion I think. But when you’re hallucinating your own new training data it seems relevant.

currentscurrents t1_jdmyjrb wrote on March 25, 2023 at 4:31 PM

Bigger models are more sample efficient, so it should need less data.

But - didn't the Chinchilla paper say bigger models need more data? Yes, but that's only true because right now compute is the limiting factor. They're intentionally trading off more data for less model size.

As computers get faster and models bigger, data will increasingly become the limiting factor, and people will trade off in the opposite direction instead.

itshouldjustglide t1_jdoazux wrote on March 25, 2023 at 10:22 PM

Don't bigger models need more data so that all of the neurons can be trained so as to reduce unnecessary noise and randomness?

ganzzahl t1_jdovu3h wrote on March 26, 2023 at 1:00 AM

I'm also very interested in this – does anyone have papers similar to Chinchilla, but without the training FLOPs restriction, and instead comparing identical dataset sizes?

An aside: I feel like I remember some older MT papers where LSTMs outperformed Transformers for some low resource languages, but I think that's outdated – using transfer learning, multilingual models and synthetic data, I'm fairly certain Transformers always outperform nowadays.

PilotThen t1_jdpnoul wrote on March 26, 2023 at 5:05 AM

I didn't find a paper but I think that is sort of what EleutherAI was doing with their pythia models.

You'll find the models on huggingface and I'd say that they are also interesting from an opensource perspective because of their license (apache-2.0)

(Also open-assistent seems to be building on top of them.)

AllowFreeSpeech t1_je3rjmv wrote on March 29, 2023 at 5:00 AM

20:1 ratio of tokens:params

[deleted] t1_jdlwka7 wrote on March 25, 2023 at 11:07 AM

[removed]

I_will_delete_myself t1_jdnrr46 wrote on March 25, 2023 at 7:58 PM

At that point we will run out of data. It will require more data efficient methods.

hadaev t1_jdlym7s wrote on March 25, 2023 at 11:32 AM

Idk, internet is big.

CacheMeUp t1_jdxvq8t wrote on March 28, 2023 at 12:04 AM

Perhaps the challenge is not the size of the internet (it's indeed big and easy to generate new content), but rather the uniqueness and novelty of the information. Anecdotally, looking at the first page of Google results often shows various low-informativeness webpages, where only a few sentences provide information and the rest is boilerplate, disclaimers, generic advice or plain spam.

phb07jm t1_jdmp7kc wrote on March 25, 2023 at 3:24 PM

I think this will prove prophetic

Thebadwolf47 t1_jdnbfya wrote on March 25, 2023 at 6:01 PM

wasn't he rather comparing the parameters to the volume of the first computer and not their transistor count?

[deleted] t1_jdmcma9 wrote on March 25, 2023 at 1:48 PM

[deleted]

Puzzleheaded_Acadia1 t1_jdn6ugl wrote on March 25, 2023 at 5:29 PM

I see a future where LLMs or llamas that are multimodels or any other new kind artificial intelligence run on esp32 level of hardware i don't know how that will work but I'm pretty sure we are heading there

atheist-projector t1_jdm7hw1 wrote on March 25, 2023 at 1:04 PM

Especialy when considr that sgd is a local minima we can probably do a whole lot better if we find a niced optimizer.

[D] Do we really need 100B+ parameters in a large language model?

soggy_mattress t1_jdl4zkg wrote on March 25, 2023 at 4:46 AM