Viewing a single comment thread. View all comments

BrohammerOK t1_j2ekoyj wrote

If you do use both in the same layer, dropout should never be applied right before batch or layer norm because the features set to 0 would affect the mean and variance calculations. As an example, it is common to use batch norm in CNNs, and then dropout after the global average pooling (before the final fc layer). Sometimes you even see dropout between conv blocks, take a look at EfficientNet by Google.

1