Independent_Tax5335 t1_j2d4j4a wrote
I think if you do batch Norm after dropout during training the parameters of batch norm are not correct at inference time. So I would do batch norm before dropout. On the other side it has been proven that batch norm also does some amount of regularization, so it is also fine to just use batch norm. I would choose the approach that works best for my specific use case
Viewing a single comment thread. View all comments