Submitted by minhrongcon2000 t3_z8kit4 in MachineLearning
yannbouteiller t1_iydt54w wrote
Reply to comment by entropyvsenergy in [D] Does Transformer need huge pretraining process? by minhrongcon2000
Considering fully connected networks as "less flexible" than transformers sounds misleading. Although very generic, as far as I can see, transformers have much more inductive bias than, e.g., an MLP that would take the whole sequence of word embeddings as input.
Viewing a single comment thread. View all comments