Viewing a single comment thread. View all comments

SeucheAchat9115 t1_izdbkkz wrote

Try to use smaller subsets of your data. It is very likely that the performance scales with the amount of data afterwards.

11

fasttosmile t1_izgxj4n wrote

Careful. There are literally dozens of LMing papers that get an improvement on PTB which do not scale to larger datasets.

3

farmingvillein t1_izi021q wrote

True, but no one has really come up with a better methodology.

The best you can do is train on smaller data + make sure that you can tell yourself a story about how the new technique will still help when data is scaled up (and then hope that you are right).

(The latter is certainly argument for staying at least semi-current with the literature, as it will help you get an intuition for what might scale up and what probably won't.)

2

SeucheAchat9115 t1_izdbmzj wrote

Or you could compare your training after e.g. two epochs and only run the best for 500 Epochs

1