Submitted by TensorDudee t3_zloof9 in MachineLearning
murrdpirate t1_j087lji wrote
Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee
>I think overfitting would still happen, but we’d still get better validation performance.
I think by definition, overfitting means your validation performance decreases (or at least does not increase).
>So maybe VIT for cifar-10 didn’t add any additional capabilities that were worth it for the problem, just additional complexity
Depends on what you mean by "the problem." The problem could be:
- Get the best possible performance on CIFAR-10 Test
- Get the best possible performance on CIFAR-10 Test, but only train on CIFAR-10 Train
Even if it was the second one, you could likely just reduce the complexity of the VIT model and have it outperform other models. Or keep it the same, but use heavy regularization during training.
Viewing a single comment thread. View all comments