Submitted by twocupv60 t3_xvem36 in MachineLearning
king_of_walrus t1_ir3egvh wrote
I have a similar problem - some of my models have taken upwards of 10 days to train! So, I have developed a strategy that is working reasonably well.
First, I work with image data and I always start by training and evaluating models at a lower resolution. For example, if I were using the CelebA-HQ dataset I would do all initial development with 128x128 images, then scale up the resolution once my results are good. Often times things translate reasonably well when scaling up and this allows for much more rapid prototyping.
Another strategy that has worked well for me is fine tuning. I train a base model with “best guess” hyperparameters to completion. Then I fine tune for a quarter of the total training time, modifying one hyperparameter of interest while keeping everything else the same. For my work, this amount of time has been enough to see the effects of the changes and to determine clear winners. In a few cases, I have been able to verify my fine tuning results by training the model from scratch under the different configurations - this is what gives me confidence in the approach. I find that this strategy still works when I have hyperparemeters which impact one another; holding one constant and optimizing the other works pretty well to balance them.
I should note that you probably don’t need to tune most hyperparameters, unless it is one you are adding. If it isn’t something novel I feel like there is bound to be a reference in the literature that has what you’re looking for. This is worth keeping in mind, I think.
Overall, it’s not really worth going to great lengths to tune things unless your results are really bad or you’re being edged out by a competitor. However, if your results are really bad that probably speaks to a larger issue.
Viewing a single comment thread. View all comments