TimDarcet
TimDarcet t1_j0cpy9m wrote
Reply to comment by TimDarcet in [D] What are the strongest plain baselines for Vision Transformers on ImageNet? by netw0rkf10w
There's also this one with very strong results, but it's a bit less straightforward to train
TimDarcet t1_j0cpta3 wrote
Reply to [D] What are the strongest plain baselines for Vision Transformers on ImageNet? by netw0rkf10w
I think Deit III is pretty sota
TimDarcet t1_j1w6ifs wrote
Reply to comment by netw0rkf10w in [D] What are the strongest plain baselines for Vision Transformers on ImageNet? by netw0rkf10w
I think the supervised training they report in MAE is 300 epochs, they used a different recipe compared to finetuning (appendix, page 12, table 11)