Submitted by alfredr t3_11vs3oe in MachineLearning

Recently, John Carmack suggested the creation of a "canonical list of references from a leading figure," referring to a never-released reading list given to him by Ilya Sutskever.

While there may be an undue interest in that specific list, MLR is such a big field that it's difficult to know where to start. What are the major papers that are relevant to state of the art work being done in 2023? Perhaps we may crowd-source a list here?

47

Comments

You must log in or register to comment.

millenial_wh00p t1_jcuksz1 wrote

What aspects? New models? Interpretability? Pipelines and scalability? Reinforcement learning? Data assurance? Too many subfields to narrow down in this question to produce a decent list, imo.

With that said, my subfield is in assurance, and some of anthropic’s work in interpretability and privileged bases is extremely interesting. Their toy models paper and the one they released last week about privileged bases in the transformer residual stream present a very novel way of thinking about model explainabity.

29

alfredr OP t1_jcumejc wrote

I'm an outsider interested in learning the landscape so my intent is to leave the question open-ended, but I'm broadly interested in architectural things like layer-design, attention mechanisms, regularization, model compression, as well as bigger picture considerations like interpretability, explainability, and fairness.

9

millenial_wh00p t1_jcun8jw wrote

Well beware open ended questions about ai/ml research in the current “gold rush” environment. If you’re into explainability and interpretability, some folks are looking into combinatorial methods for features and their interactions to predict data coverage. This plus anthropic’s papers start to open up some new ground in interpretability for CV.

https://arxiv.org/pdf/2201.12428.pdf

11

alfredr OP t1_jcuoqg4 wrote

Point taken on the "gold rush". My background is CS Theory so the incorporation of combinatorial methods feels right at home. Along these lines, are you aware of the use of any work incorporating (combinatorial) logic verification into generative language models? The end goal would be improved argument synthesis (e.g. mathematical proofs, say)

4

millenial_wh00p t1_jcuq0zo wrote

No, unfortunately most of my work is with tabular data with a bit of computer vision- I haven’t looked into any application of language models in that area unfortunately. In theory the tokenization in language models shouldn’t be much different than features in tabular/imagery data. There probably are some parallels worth exploring there, I’m just not aware of any papers.

7

Expensive-Type2132 t1_jcvddj4 wrote

If you’re outside of the community, it might be more beneficial to look at applicative papers to get an understanding of tasks, objective functions, datasets, training strategies, etc. Especially during this period where there isn’t that much architectural diversity. But, nevertheless, read whatever you’re noticed to read!

2

fromnighttilldawn t1_jcxgr6b wrote

I don't read any of the papers because there is basically no way to re-implement them or independently verify them. People can feel shocked, surprised, amazed or enlightened all they want while reading the paper but in truth people still have no idea how any of this truly work.

Before at least you had a mathematical model to work with which shows that even at small-scale this idea can lead to something that work as promised on a larger-scale ML model.

Nowadays OpenAI can claim that Jesus came back and cleaned the data for their model and we would actually have no way to actually verify the veracity of this claim.

18

lmericle t1_jcyxiex wrote

Well, no, it isn't. You are looking for machine learning research. That list is only about LLMs, a very specific and over-hyped sub-sub-application of ML techniques.

If all you want is to attach yourself to the hype cycle, then that link still won't be enough, but at least it's a start.

0

alfredr OP t1_jcz3keg wrote

I understand that it’s about LLMs and that it is not comprehensive — also that the site author has (perhaps questionably) embedded some of their own work in the list. That said, it does otherwise appear to be a list of influential papers representing a current major thrust.

I did not downvote you, btw

2