beezlebub33 t1_iy8b3ht wrote
This is very interesting, if somewhat dense and hard to follow if you don't have some of the background.
I recommend reading an article they reference: A Mathematical Framework for Transformer Circuits https://transformer-circuits.pub/2021/framework/index.html
If nothing else, that paper will explain that OV means output-value:
>Attention heads can be understood as having two largely independent
computations: a QK (“query-key”) circuit which computes the attention
pattern, and an OV (“output-value”) circuit which computes how each
token affects the output if attended to.
visarga OP t1_iy8xctv wrote
Oh yes, for people who prefer video there is also
CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning
LostInSpace2981 t1_iyb7adw wrote
This is great, thank you for sharing!
Viewing a single comment thread. View all comments