Submitted by blacklemon67 t3_11misax in MachineLearning
harharveryfunny t1_jbjxolz wrote
Reply to comment by EmbarrassedHelp in [D] Why are so many tokens needed to train large language models? by blacklemon67
The LLM name for things like GPT-3 seems to have stuck, which IMO is a bit unfortunate since it's rather misleading. They certainly wouldn't need the amount of data they do if the goal was merely a language model, nor would we need to have progressed past smaller models like GPT-1. The "predict next word" training/feedback may not have changed, but the capabilities people are hoping to induce in these larger/ginormous models is now way beyond language and into the realms of world model, semantics and thought.
Viewing a single comment thread. View all comments