Submitted by minhrongcon2000 t3_z8kit4 in MachineLearning
IntelArtiGen t1_iycm9kk wrote
It depends on the accuracy you want, I can train a transformer in 30 min with 30k sentences on an RTX2070 Super and get meaningful embeddings (similar words are close to each others), it works but same as for all models it won't be SOTA if you don't use billions of sentences and a much larger model with much more GPUs.
I was told the same thing and I wouldn't agree, you need a huge pretraining process if you want SOTA results, if you can compromise you don't need as much data, but LSTM might perform better with little data.
Viewing a single comment thread. View all comments