TimDarcet

TimDarcet t1_j1w6ifs wrote on December 27, 2022 at 9:19 PM

Reply to comment by netw0rkf10w in [D] What are the strongest plain baselines for Vision Transformers on ImageNet? by netw0rkf10w

I think the supervised training they report in MAE is 300 epochs, they used a different recipe compared to finetuning (appendix, page 12, table 11)

TimDarcet t1_j0cpy9m wrote on December 15, 2022 at 6:18 PM

Reply to comment by TimDarcet in [D] What are the strongest plain baselines for Vision Transformers on ImageNet? by netw0rkf10w

There's also this one with very strong results, but it's a bit less straightforward to train

TimDarcet t1_j0cpta3 wrote on December 15, 2022 at 6:17 PM

Reply to [D] What are the strongest plain baselines for Vision Transformers on ImageNet? by netw0rkf10w

I think Deit III is pretty sota