Submitted by Beneficial_Law_5613 t3_zzqzoy in MachineLearning
Some time ago I saw an article saying it is not preferred to use dropout and any kind of normalization(like batch or layer) in a model. But I am not sure why. Any suggestion about that?
Submitted by Beneficial_Law_5613 t3_zzqzoy in MachineLearning
Some time ago I saw an article saying it is not preferred to use dropout and any kind of normalization(like batch or layer) in a model. But I am not sure why. Any suggestion about that?
Independent_Tax5335 t1_j2d4j4a wrote
I think if you do batch Norm after dropout during training the parameters of batch norm are not correct at inference time. So I would do batch norm before dropout. On the other side it has been proven that batch norm also does some amount of regularization, so it is also fine to just use batch norm. I would choose the approach that works best for my specific use case