Submitted by faker10101891 t3_10cxuo2 in MachineLearning
KBM_KBM t1_j4j10y6 wrote
You can pre train and finetune energy efficient language models such as electra or convbert in this gpu. But maybe the batch size might not be too big so the descent would be a bit noisy and also keep the corpus size as small as possible.
Look into bio electra paper which also has the notebook on how he has trained it .
Viewing a single comment thread. View all comments