IntelArtiGen t1_iycm9kk wrote on November 30, 2022 at 12:10 PM

It depends on the accuracy you want, I can train a transformer in 30 min with 30k sentences on an RTX2070 Super and get meaningful embeddings (similar words are close to each others), it works but same as for all models it won't be SOTA if you don't use billions of sentences and a much larger model with much more GPUs.

I was told the same thing and I wouldn't agree, you need a huge pretraining process if you want SOTA results, if you can compromise you don't need as much data, but LSTM might perform better with little data.