Submitted by TensorDudee t3_zloof9 in MachineLearning
nucLeaRStarcraft t1_j08cjvc wrote
Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee
I agree with you, if we want to test the architecture, we should use the same training procedure, including pre-training.
My theory is, that given the current results of GPT-like models, which use transformers under the hood, and given the fact that these groups have the compute power and data to train non-attention based recurrent models, it's quite unlikely that the architecture isn't a main contributor.
Viewing a single comment thread. View all comments