gibs t1_itnalej wrote on October 24, 2022 at 10:09 PM

Reply to comment by TFenrir in Large Language Models Can Self-Improve by xutw21

I've definitely heard that idea expressed on Lex's podcast. I would say prediction is necessary but not sufficient for producing sentience. And language models are neither. I think the kinds of higher level thinking that we associate with sentience arise from specific architectures involving prediction networks and other functionality, which we aren't really capturing yet in the deep learning space.

TFenrir t1_itni82q wrote on October 24, 2022 at 11:06 PM

I don't necessarily disagree, but I also think sometimes we romanticize the brain a bit. There were a lot of things we increasingly are surprised about achieving with language model and scale, and different training architecture. Like Chain of Thought seems to have become not just a tool to improve prompts, but to help with self regulated fine tuning.

I'm reading papers where Google combines more and more of these new techniques, architectures, and general lessons and they still haven't finished smushing them all together.

I wonder what happens when we smush more? What happens when we combine all these techniques, UL2/Flan/lookup/models making child models, etc etc.

All that being said, I think I actually agree with you. I am currently intrigued by different architectures that allow for sparse activation and are more conducive to transfer learning. I really liked this paper:

https://arxiv.org/abs/2205.12755#:~:text=version%2C%20v3)%5D-,An%20Evolutionary%20Approach%20to%20Dynamic%20Introduction%20of,Large%2Dscale%20Multitask%20Learning%20Systems&text=Multitask%20learning%20assumes%20that%20models,key%20feature%20of%20human%20learning.

gibs t1_itnnx1y wrote on October 24, 2022 at 11:50 PM

Just read the first part -- that is a super interesting approach. I'm convinced that robust continual learning is a critical component for AGI. It also reminds me of another of Lex Fridman's podcasts where he had a cognitive scientist guy (I forget who) whose main idea about human cognition was that we have a collection of mini-experts for any given cognitive task. They compete (or have their outputs summed) to give us a final answer to whatever the task is. The paper's approach of automatically compartmentalising knowledge into functional components I think is another critical part of the architecture for human-like cognition. Very very cool.