Submitted by V1bicycle t3_10ol7g6 in deeplearning
The ResNet paper by Kaiming He et al. does not use dropout for the models. A lot of models prior to ResNets, such as AlexNet and VGGNet gained from using dropout.
Why did the authors choose not to use dropout for ResNets ? Is it because they use L2 regularization(weight decay) and batch normalization which are forms of regularization which can substitute dropout regularization ?
suflaj t1_j6fgjoj wrote
Dropout is less effective in CNNs and Batch Normalization replaces it.