SeucheAchat9115 t1_izdbkkz wrote on December 8, 2022 at 6:56 AM

Try to use smaller subsets of your data. It is very likely that the performance scales with the amount of data afterwards.

fasttosmile t1_izgxj4n wrote on December 9, 2022 at 12:50 AM

Careful. There are literally dozens of LMing papers that get an improvement on PTB which do not scale to larger datasets.

farmingvillein t1_izi021q wrote on December 9, 2022 at 6:21 AM

True, but no one has really come up with a better methodology.

The best you can do is train on smaller data + make sure that you can tell yourself a story about how the new technique will still help when data is scaled up (and then hope that you are right).

(The latter is certainly argument for staying at least semi-current with the literature, as it will help you get an intuition for what might scale up and what probably won't.)

SeucheAchat9115 t1_izdbmzj wrote on December 8, 2022 at 6:57 AM

Or you could compare your training after e.g. two epochs and only run the best for 500 Epochs