Hello all!

Absolute noob here. I'm trying to optimize an image classifier using transfer learning from InceptionV3 (last layer being 'Mixed 7') and fine-tuned with a small convolutional network on top. So far, I find that changing hyperparameters yields modest (if any) changes in performance and each attempt takes a prohibitive amount of time. I was thus wondering if there were any way to systematically test out multiple changes in hyperparameters without just manually changing one at a time in incremental fashion.

Comments

You must log in or register to comment.

suflaj t1_j5qb32y wrote on January 24, 2023 at 8:41 PM

#1,473,571

There is this: https://www.microsoft.com/en-us/research/blog/%C2%B5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/

However, it's unlikely to help in your case. The best thing you can do is grid search if you know something about the problem, or just random search. I prefer random search even if I'm am expert for the problem, ESPECIALLY with ML models.

But I'm curious how it takes a long time. You don't have to train the whole dataset. Take 10% for training and 10% for validation, or less if that dataset is huge. You just need enough data to learn something. Then your optimal hyperparameters are a good enough approximation.

Also, it might help to just not tune redundant hyperparameters. Layer sizes are usually such, as is almost any hyperparameter in the Adam family of optimizers besides learning rate and to a lesser extent first momentum. Which ones are you optimizing?

NinjaUnlikely6343 OP t1_j5r4d3x wrote on January 24, 2023 at 11:48 PM

#1,476,377

Replying to suflaj (#1,473,571)

Thanks a lot for the detailed response! Didn't know you could use a portion of the dataset and expect approximately what you'd get with the whole set. I'm currently just testing different learning rates, but I thought about having a go at dropout rate as well.

suflaj t1_j5r5bfw wrote on January 24, 2023 at 11:54 PM

#1,476,471

Replying to NinjaUnlikely6343 (#1,476,377)

For learning rate you should just use a good starting point based on the batch size and architecture and relegate everything else to the scheduler and optimizer. I don't think there's any point messing with the learning rate once you find one that doesn't blow up your model, just use warmup or plateau schedulers to manage it for you after that.

Since you mentioned Inception I believe that unless you are using quite big batch sizes, your starting LR should be the magical 3e-4 for Adam or 1e-2 for SGD, and you would just use a ReduceOnPlateau scheduler with ex. patience of 3 epochs, cooldown of 2, factor of 0.1 and probably employ EarlyStopping if metric doesn't improve after 6 epochs.

NinjaUnlikely6343 OP t1_j5rjhvi wrote on January 25, 2023 at 1:35 AM

#1,477,807

Replying to suflaj (#1,476,471)

Thanks a lot! I'll try that and keep you posted

thatpretzelife t1_j5sy10d wrote on January 25, 2023 at 9:56 AM

#1,482,001

As another option, if you haven’t already tried look into a cloud computing solution. For me it cut my image processing uni assignment down from a couple hours to process to a minute. Google colab’s free or use something like Paperspace which you pay ~8USD but is much faster

ChingBlue t1_j5tdh79 wrote on January 25, 2023 at 12:58 PM

#1,483,379

Off the top of my head you can either use Grid Search to test hyperparam combinations, Random Search to randomize hyperparams and Neural search uses ML to optimize hyperparameter tuning. You can use finetuners for this as well.