Viewing a single comment thread. View all comments

suflaj t1_je1uvo8 wrote

They probably redid the experiments themselves. Also, ResNets had some changes shortly after release I believe, and they could have used different pretraining weights. AFAIK He et al. never released their weights.

Furthermore, Wolfram and PyTorch pretrained weights are also around 22% top-1 error rate, so that is probably the correct error rate. Since PyTorch provides weights that reach 18% top-1 error rate with some small adjustments to the training procedure, it is possible the authors got lucky with the hyperparameters, or employed some techniques they didn't describe in the paper.

2