Viewing a single comment thread. View all comments

yannbouteiller t1_iydt54w wrote on November 30, 2022 at 5:36 PM

Reply to comment by entropyvsenergy in [D] Does Transformer need huge pretraining process? by minhrongcon2000

Considering fully connected networks as "less flexible" than transformers sounds misleading. Although very generic, as far as I can see, transformers have much more inductive bias than, e.g., an MLP that would take the whole sequence of word embeddings as input.