Viewing a single comment thread. View all comments

suflaj t1_iycm2mj wrote on November 30, 2022 at 12:08 PM

Depends on the transformer, but generally yes. Pretraining BERT costs like 10k$ in compute, maybe less now. You can train BiLSTM models from scratch on a single consumer card for a similar task in a day or so.