Viewing a single comment thread. View all comments

DigThatData t1_j3v2gjs wrote on January 11, 2023 at 6:40 AM

attention is essentially a dynamically weighted cross-product. if you haven't already seen this blog post, it's one of the more popular explanations: https://jalammar.github.io/illustrated-transformer/