NichtMarlon t1_j3ikrkw wrote on January 8, 2023 at 8:49 PM

Reply to [D] Have you ever used Knowledge Distillation in practice? by fredlafrite

Yes its very useful for text classification tasks. Big transformers get highest accuracy, but we can't deploy them because they are too slow. So we distil knowledge from bigger transformers into smaller transformers or CNNs. If you have a decent amount of unlabeled data to pseudo-label with the teacher, there is barely any loss in accuracy for the student model.