Submitted by ichiichisan t3_ys974h in MachineLearning
spurious_waffles t1_ivzflun wrote
You could try very small character level perturbations of your input such as deletions, repetitions, and character swaps. You just need to be careful to not change the semantic meaning of your input text.
There's some research our there showing that BERT-like models break down on standard benchmarks when the benchmark text contains a small amount of character level noise.
ichiichisan OP t1_ivznfay wrote
Thanks, but I am not looking for suggestions, but rather for something that has been proven to work, in best case with research on it.
It is quite common knowledge that any random altering of input text does not help with finetuning NLP tasks.
spurious_waffles t1_ivztdom wrote
There is a ton of research on denoising objectives in NLP.
Best of luck!
Viewing a single comment thread. View all comments