I think if you do batch Norm after dropout during training the parameters of batch norm are not correct at inference time. So I would do batch norm before dropout. On the other side it has been proven that batch norm also does some amount of regularization, so it is also fine to just use batch norm. I would choose the approach that works best for my specific use case
Independent_Tax5335 t1_j2d4j4a wrote
Reply to [D] Does it make sense to use dropout and layer normalization in the same model? by Beneficial_Law_5613
I think if you do batch Norm after dropout during training the parameters of batch norm are not correct at inference time. So I would do batch norm before dropout. On the other side it has been proven that batch norm also does some amount of regularization, so it is also fine to just use batch norm. I would choose the approach that works best for my specific use case