ant9zzzzzzzzzz t1_j6a37a1 wrote on January 28, 2023 at 9:09 PM

Is there research about order of training examples, or running epochs on batches of data rather than full training set at a time?

I was thinking about how for people we learn better if focus on one problem at a time until grokking it, rather than randomly learning things in different domains.

I am thinking like train some epochs on one label type, then another, rather than all data in the same epoch, for example.

This is also related to state full retraining, like one probably does professionally - you have an existing model checkpoint and retrain on new data. How does it compare to retraining on all data from scratch?

mahnehsilla t1_j6agijb wrote on January 28, 2023 at 10:44 PM

The data by batches or by item shouldnt matter more than speedwise if you shuffle it (best practice.)

trnka t1_j6ceex5 wrote on January 29, 2023 at 9:30 AM

I think curriculum learning is the name. Here's a recent survey. I've seen it in NLP tasks where it can help to do early epochs on short inputs. Kinda like starting kids with short sentences.

I haven't heard of anyone adjusting the labels at each stage of curriculum learning though.

ant9zzzzzzzzzz t1_j6dmb28 wrote on January 29, 2023 at 4:35 PM

Thank you!