Submitted by fredlafrite t3_106no9h in MachineLearning
NichtMarlon t1_j3ikrkw wrote
Yes its very useful for text classification tasks. Big transformers get highest accuracy, but we can't deploy them because they are too slow. So we distil knowledge from bigger transformers into smaller transformers or CNNs. If you have a decent amount of unlabeled data to pseudo-label with the teacher, there is barely any loss in accuracy for the student model.
Viewing a single comment thread. View all comments