ResourceResearch t1_ixpj3v5 wrote on November 25, 2022 at 7:39 AM

Reply to [D] Informal meetup at NeurIPS next week by tlyleung

I‘m there as well & interested.

There are a bunch of bars at the riverside - as a suggestion

ResourceResearch t1_iro8zof wrote on October 9, 2022 at 7:39 PM

Reply to comment by DeepNonseNse in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187

Afaik it is not clear. In my personal experience, the number of parameters is more important, rather then the layer size, i.e. a smaller number of wider layers does the same job as a large number of narrower layers.

Consider this paper for empirical insights for large models: https://arxiv.org/pdf/2001.08361.pdf

ResourceResearch t1_iqunzdz wrote on October 3, 2022 at 5:45 AM

Reply to comment by pia322 in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187

Well at least for ResNet, there is a technical reason for its success. Skip connections mitigate vanishing gradients, via the chain rule of differentiation.