RobKnight_ t1_iqu7c54 wrote on October 3, 2022 at 2:56 AM Reply to comment by 029187 in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187 Deeper layers in CNNs are not constrained to locality Permalink Parent 1
RobKnight_ t1_iqu7c54 wrote
Reply to comment by 029187 in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
Deeper layers in CNNs are not constrained to locality