artsybashev t1_iw29zh1 wrote on November 12, 2022 at 11:53 AM

A lot of deep learning has been modern equivalent of witchcraft. Just some ideas that might make sense being squashed together.

Hyperparameter tuning is one of the most obscure and hard to learn part of neural network training since it is hard to do multiple runs with it for models that take more than a few weeks/thousands of dollars to train. Most of the researchers just have learned some good initial guess and might run the model with some set of hyperparameters from which the best result is chosen.

Some of the hyperparameter tunings can also be done with a smaller model and the amount of hyperparameter tuning can be reduced while growing the model to the target size.