Submitted by hardmaru t3_ys36do in MachineLearning
canbooo t1_ivx9yjn wrote
Very interesting stuff, just skimmed through and will definitely read more in depth but how does this break symmetry?
jimmiebtlr t1_ivy59yw wrote
Haven’t read it yet, but wouldnt symmetry only exist for 2 node if the input and output weights have the same 1s and 0s?
canbooo t1_ivydtlt wrote
You are right and what I ask may be practically irrelevant and I really should rtfp. However, think about the edge case of 1 Layer with 1 input and 1 output. Each node having 1 as input weight sees the same gradient, similar to the nodes having 0. Increasing the number of inputs make it combinatorially improbable to have the same configuration but increasing the number of nodes in a layer makes it likelier. So, it could be relevant for low dimensions or models with a narrow bottleneck. I am sure that the authors already thought about this problem and either discarded it as it is quite unlikely in their tested settings or they already have a solution/analysis somewhere in the paper, hence my question.
Viewing a single comment thread. View all comments