VirtualHat t1_ir3tiey wrote on October 5, 2022 at 3:10 AM

Here are some options

Tune a smaller network, then apply the hyperparameters to the larger one and 'hope for the best'.
As others have said, train less, for example, 10 epochs rather than 100. I typically find this produces the wrong results though (the best performer is often poor early on)
For low dim (2d) perform a very coarse grid search (space samples an order of magnitude apart, maybe two), then use just the best model. This is often the best method as you don't want to overtune the hyperparameters.
For high dim, just use random search, then marginalize over all but one parameter using the mean of the best 5-runs. This works really well.
If the goal is often to compare two methods rather than to maximize the score, you can use other people's hyperparameters.
Baysian optimization is usually not worth the time. In small dims do grid search, in large do random search.
If you have the resources then train your models in parallel. This is a really easy way to make use of multiple GPUs if you have them.
In some cases you can perform early stopping for models which are clearly not working. I try not to do this though.
When I do HPS I'm doing it on another dataset than my main one. This helps make things quicker. I'm doing RL though, so it's a bit different I guess.