Submitted by ichiichisan t3_ys974h in MachineLearning
I am trying to research well working methods for regulraization in small-data NLP finetuning scenarios, specifically for regression.
Coming from computer vision background, it appears to me that no established method has emerged that works well across tasks and it is really hard to combat stark overfitting on small data tasks.
I am specifically looking for methods that are special to NLP finetuning and go beyond classical DL regularization techniques like dropout or weight decay.
Happy for any pointers!
Nameless1995 t1_iw05d36 wrote
There isn't an established standard AFAIK.
EDA is a simple baseline for augmentation: https://arxiv.org/abs/1901.11196
(see citations in google scholar for recent ones).
(Recent ones are playing around with counterfactural augmentation and such but not sure if any standard stable technology has arisen.)
This one had nice low resource performance: https://arxiv.org/pdf/2106.05469.pdf
Also this: https://aclanthology.org/2021.emnlp-main.749.pdf (you can find some new stuff from citations in google scholar/semantic scholar).
I think Prompt Tuning, Contrastive Learning (https://openreview.net/pdf?id=cu7IUiOhujH) did show better very low resource performance too, but, the benefit tapers out as you increase data.
If you are seeking for Adversarial robustness there are also other techniques for that. I think FreeLB was popular a while ago. There's also SAM for flatter minima.