TimDarcet t1_j0cpta3 wrote on December 15, 2022 at 6:17 PM

I think Deit III is pretty sota

TimDarcet t1_j0cpy9m wrote on December 15, 2022 at 6:18 PM

There's also this one with very strong results, but it's a bit less straightforward to train

netw0rkf10w OP t1_j0gcgxy wrote on December 16, 2022 at 1:10 PM

Thanks. DeiT is actually a very nice paper from which one can learn a lot of things. But the training regimes that they used seem a bit long to me: 300 to 800 epochs. The authors of MAE managed to achieve 82.3% for ViT-B after only 100 epochs, so I'm wondering if anyone in the literature has ever been able to match that.

TimDarcet t1_j1w6ifs wrote on December 27, 2022 at 9:19 PM

I think the supervised training they report in MAE is 300 epochs, they used a different recipe compared to finetuning (appendix, page 12, table 11)

netw0rkf10w OP t1_j2939o2 wrote on December 30, 2022 at 3:18 PM

You are right, indeed. Not sure why I missed that. I guess one can conclude that DeiT 3 is currently SoTA for training from scratch.