hebekec256 OP t1_jbz0mpm wrote on March 12, 2023 at 8:40 PM

Reply to comment by MinaKovacs in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

Yes, I understand that. but LLMs and extensions of LLMs (like PALM-E) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would just break, otherwise I can't see why they wouldn't do it, since the risk to reward ratio seems favorable to me

TemperatureAmazing67 t1_jbzcn6a wrote on March 12, 2023 at 10:05 PM

>extensions of LLMs (like
>
>PALM-E
>
>) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would

The problem is that we have scaling laws for NN. We just do not have the data for 50T parameters. We need somehow to get these data. The answer on this question costs a lot.

Co0k1eGal3xy t1_jbzi8wc wrote on March 12, 2023 at 10:45 PM

Double Decent, more parameters are MORE data efficient.
Most of these LLMs barely complete 1 epoch, so there is no concern about overfitting currently.

MinaKovacs t1_jbz2gqw wrote on March 12, 2023 at 8:53 PM

I think the math clearly doesn't work out; otherwise, Google would have monetized it already. ChatGPT is not profitable or practical for search. The cost of hardware, power consumption, and slow performance are already at the limits. It will take something revolutionary, beyond binary computing, to make ML anything more than expensive algorithmic pattern recognition.