Don’t they have two different purposes? As I understand it: The batchnorm is used to maintain activations along deep neural networks so that non linear activations do not kill the neurons whose probability distributions would have flattened out while a dropout is only meant to train a network uniformly to prevent overfitting.
MinotaurOnLucy t1_j6fpaoj wrote
Reply to comment by suflaj in Why did the original ResNet paper not use dropout? by V1bicycle
Don’t they have two different purposes? As I understand it: The batchnorm is used to maintain activations along deep neural networks so that non linear activations do not kill the neurons whose probability distributions would have flattened out while a dropout is only meant to train a network uniformly to prevent overfitting.