[D] Does Transformer need huge pretraining process? Submitted by minhrongcon2000 t3_z8kit4 on November 30, 2022 at 7:06 AM in MachineLearning 8 comments 1
DaLameLama t1_iyc7nha wrote on November 30, 2022 at 8:48 AM I don't think that's true. It would imply that Bi-LSTMs reach good performance faster than Transformers, and Transformers catch up later during training. I've never seen proof for that, nor do my personal experiences confirm this. Permalink 3
Viewing a single comment thread. View all comments