Viewing a single comment thread. View all comments

NichtMarlon t1_j3ikrkw wrote on January 8, 2023 at 8:49 PM

Yes its very useful for text classification tasks. Big transformers get highest accuracy, but we can't deploy them because they are too slow. So we distil knowledge from bigger transformers into smaller transformers or CNNs. If you have a decent amount of unlabeled data to pseudo-label with the teacher, there is barely any loss in accuracy for the student model.