[D] Does Transformer need huge pretraining process? Submitted by minhrongcon2000 t3_z8kit4 on November 30, 2022 at 7:06 AM in MachineLearning 8 comments 1
suflaj t1_iycm2mj wrote on November 30, 2022 at 12:08 PM Depends on the transformer, but generally yes. Pretraining BERT costs like 10k$ in compute, maybe less now. You can train BiLSTM models from scratch on a single consumer card for a similar task in a day or so. Permalink 4
Viewing a single comment thread. View all comments