Viewing a single comment thread. View all comments

Gere1 t1_iv0505o wrote

Does someone know a good ablation study of the mentioned techniques. I've seen results where neither dropout nor layer normalization did much. So I wonder if these 2 techniques are a believe or still crucial.

2