spurious_waffles t1_ivztdom wrote on November 11, 2022 at 9:00 PM

Reply to comment by ichiichisan in [D] Regularization & augmentation for NLP finetuning by ichiichisan

There is a ton of research on denoising objectives in NLP.

Best of luck!

spurious_waffles t1_ivzflun wrote on November 11, 2022 at 7:26 PM

Reply to [D] Regularization & augmentation for NLP finetuning by ichiichisan

You could try very small character level perturbations of your input such as deletions, repetitions, and character swaps. You just need to be careful to not change the semantic meaning of your input text.

There's some research our there showing that BERT-like models break down on standard benchmarks when the benchmark text contains a small amount of character level noise.