Submitted by benanne t3_107g3yf in MachineLearning
DigThatData t1_j3v2gjs wrote
Reply to comment by thecodethinker in [R] Diffusion language models by benanne
attention is essentially a dynamically weighted cross-product. if you haven't already seen this blog post, it's one of the more popular explanations: https://jalammar.github.io/illustrated-transformer/
Viewing a single comment thread. View all comments