2600_yay t1_j0fc7j4 wrote on December 16, 2022 at 5:40 AM Reply to [D] Trying to find paper about n-grams in early transformer layers by soraki_soladead "Are Neighbors Enough"'s authors swap out self-attention in a Transformer for a multi-head neural n-gram model? Perhaps that's what you're looking for? https://arxiv.org/abs/2207.13354 Permalink 3
2600_yay t1_isgodr8 wrote on October 15, 2022 at 9:10 PM Reply to Painting Pumpkins, Me, Gouache, 2022 by sijesn The fur hatching on the cat is so pleasing to look at! Permalink 2
2600_yay t1_j0fc7j4 wrote
Reply to [D] Trying to find paper about n-grams in early transformer layers by soraki_soladead
"Are Neighbors Enough"'s authors swap out self-attention in a Transformer for a multi-head neural n-gram model? Perhaps that's what you're looking for?
https://arxiv.org/abs/2207.13354