Submitted by NinjaUnlikely6343 t3_10kecyc in deeplearning

Hello all!

Absolute noob here. I'm trying to optimize an image classifier using transfer learning from InceptionV3 (last layer being 'Mixed 7') and fine-tuned with a small convolutional network on top. So far, I find that changing hyperparameters yields modest (if any) changes in performance and each attempt takes a prohibitive amount of time. I was thus wondering if there were any way to systematically test out multiple changes in hyperparameters without just manually changing one at a time in incremental fashion.

17

Comments

You must log in or register to comment.

ChingBlue t1_j5tdh79 wrote

Off the top of my head you can either use Grid Search to test hyperparam combinations, Random Search to randomize hyperparams and Neural search uses ML to optimize hyperparameter tuning. You can use finetuners for this as well.

9

suflaj t1_j5qb32y wrote

There is this: https://www.microsoft.com/en-us/research/blog/%C2%B5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/

However, it's unlikely to help in your case. The best thing you can do is grid search if you know something about the problem, or just random search. I prefer random search even if I'm am expert for the problem, ESPECIALLY with ML models.

But I'm curious how it takes a long time. You don't have to train the whole dataset. Take 10% for training and 10% for validation, or less if that dataset is huge. You just need enough data to learn something. Then your optimal hyperparameters are a good enough approximation.

Also, it might help to just not tune redundant hyperparameters. Layer sizes are usually such, as is almost any hyperparameter in the Adam family of optimizers besides learning rate and to a lesser extent first momentum. Which ones are you optimizing?

4

NinjaUnlikely6343 OP t1_j5r4d3x wrote

Thanks a lot for the detailed response! Didn't know you could use a portion of the dataset and expect approximately what you'd get with the whole set. I'm currently just testing different learning rates, but I thought about having a go at dropout rate as well.

2

suflaj t1_j5r5bfw wrote

For learning rate you should just use a good starting point based on the batch size and architecture and relegate everything else to the scheduler and optimizer. I don't think there's any point messing with the learning rate once you find one that doesn't blow up your model, just use warmup or plateau schedulers to manage it for you after that.

Since you mentioned Inception I believe that unless you are using quite big batch sizes, your starting LR should be the magical 3e-4 for Adam or 1e-2 for SGD, and you would just use a ReduceOnPlateau scheduler with ex. patience of 3 epochs, cooldown of 2, factor of 0.1 and probably employ EarlyStopping if metric doesn't improve after 6 epochs.

2

thatpretzelife t1_j5sy10d wrote

As another option, if you haven’t already tried look into a cloud computing solution. For me it cut my image processing uni assignment down from a couple hours to process to a minute. Google colab’s free or use something like Paperspace which you pay ~8USD but is much faster

2

NinjaUnlikely6343 OP t1_j5tmucg wrote

Thanks for the advice! I'm actually already SSH tunneling to the immense computing resources at Compute Canada. It's still extremely long haha

2

emad_eldeen t1_j5uw5sp wrote

Wandb is the best! https://wandb.ai/

Check the hyperparameter sweep option. It is FANTASTIC!
you can set range/values for each hyperparameter and let it run.

2

NinjaUnlikely6343 OP t1_j5vq6vj wrote

I've heard of it when I started delving into deep learning, but it seemed too complex for me at the time. I'll check it out!

1