eliyah23rd t1_ivf9vsf wrote
The author's argument seems to be:
- There are many people writing machine learning papers without understanding core statistical principles.
- The best explanation for this is that there is so much data, that there are no valid methods for distinguishing valid correlations from accidental ones.
- Therefore, big data will produce nothing of much value from now on, since we have too much data already.
There are many procedures in place to give some protection from data over-fitting. Random pruning is one of them.
GPT-3 (and its siblings) and DALL-E 2 (and its) would not be possible without the scrape of a significant fraction of all the textual data available (DALL-E obviously combines this with images). They overcome overfitting using hundreds of billions of parameters and moving up. The power requirements of training these systems alone is mind-boggling.
Much medical data that is fed into learning systems is absurdly under fitted. Imagine a (rather dystopian) world where all health indicators of all people taking specific drugs was fed into learning systems. A doctor might one day know whether a specific drug will be effective for you specifically.
There is much yet to learn. To make a falsifiable prediction, corporations will be greedily seeking to increase their data input for decades to come. Power needs will continue to grow. This will be driven by the success (in their own value terms) of their procedures and not blind adherence to false assumptions as the author might seem to suggest.
visarga t1_ivioifb wrote
> They overcome overfitting using hundreds of billions of parameters
Increasing model size usually increases overfitting. The opposite effect comes from increasing the dataset size.
eliyah23rd t1_ivjri3f wrote
Thank you for your reply.
Perhaps I phrased it poorly. You are correct, of course, that increasing model size tends to increase overfitting in the normal sense. Overfitting in this case means a failure of generalization. This would also lead to bad results in new data.
In spoke in the context of this article, which claimed that spurious generalizations are found. LLMs move two parameters up in parallel in order to produced the amazing results that they do. They increase both the quantity of data and the numbers of parameters.
Viewing a single comment thread. View all comments