Submitted by V1bicycle t3_10ol7g6 in deeplearning

The ResNet paper by Kaiming He et al. does not use dropout for the models. A lot of models prior to ResNets, such as AlexNet and VGGNet gained from using dropout.

Why did the authors choose not to use dropout for ResNets ? Is it because they use L2 regularization(weight decay) and batch normalization which are forms of regularization which can substitute dropout regularization ?

4

Comments

You must log in or register to comment.

suflaj t1_j6fgjoj wrote

Dropout is less effective in CNNs and Batch Normalization replaces it.

1

MinotaurOnLucy t1_j6fpaoj wrote

Don’t they have two different purposes? As I understand it: The batchnorm is used to maintain activations along deep neural networks so that non linear activations do not kill the neurons whose probability distributions would have flattened out while a dropout is only meant to train a network uniformly to prevent overfitting.

1

XecutionStyle t1_j6ggq37 wrote

BN is used to reduce covariate shift, it just happened to regularize. Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

I doubt what you're saying is true, that they're effectively the same. Try putting one after the other to see the effect. Two drop-out layers or BN layers in contrast have no problem co-existing.

edit: sorry what I mean is the variants of drop-out that work with CNNs (that don't have detrimental effects) haven't existed then.

1

suflaj t1_j6hfkdj wrote

> BN is used to reduce covariate shift, it just happened to regularize.

The first part was hypothesized, but not proven. It is a popular belief, like all other hypotheses why BN works so well.

> Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

What does becoming big mean? Dropout was introduced in 2012 and used ever since. It was never big in the sense that you would always use it.

It is certainly false that Dropout was used because of ResNets or immediately after them for CNNs, as the first paper proving that there is benefit in using Dropout for convolutional layers was in 2017: https://link.springer.com/chapter/10.1007/978-3-319-54184-6_12

> I doubt what you're saying is true, that they're effectively the same.

I never said that.

0