Viewing a single comment thread. View all comments

techlos t1_ir131zp wrote

two things you can do are early stopping + using a subset of your dataset.

In my experience, hyperparams that have the best convergence at 3~5 epochs will generalize to pretty good convergence on a full training run. It won't guarantee the best performance, but if you're on a budget it's a great compromise.

14