visarga OP t1_iy8xctv wrote
Reply to comment by beezlebub33 in [r] The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable - LessWrong by visarga
Oh yes, for people who prefer video there is also
CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning
Viewing a single comment thread. View all comments