Submitted by OffswitchToggle t3_zd6r90 in deeplearning
I have a project where I need to detect the orientation of machined parts on an assembly line. I have a dataset of hundreds of thousands of correctly labeled images (i.e. "correct orientation", "rotated left", "rotated right", "upside down"). Each image is 1024x768.
I'm relatively new to DL but have done a lot of reading about CNNs. I've come across many articles that discuss the process of hyperparameter tuning. But, I haven't been able to find anything related to creating an initial CNN architecture based upon the type of problem you are trying to solve. I've seen setups as basic as "dogs vs cats" up through implementing VGG16 from scratch.
How do I choose an initial architecture that is appropriate for my problem? TIA
saw79 t1_iz0158r wrote
I don't think it makes sense these days to implement a CNN architecture from scratch for a standard problem (e.g., classification), except as a learning exercise. A common set of classification networks that I use as a go-to are the EfficientNet architectures. Usually I use the
timm
library (for PyTorch), and instantiating the model is just 1 line of code (see its docs). You can either load it in pretrained (from ImageNet) or randomly initialized, and further fine-tune yourself. EfficientNet has versions 0-7 that give increasing performance at the cost of computation/size. If you're in TensorFlow-land I'm sure there's something analogous. Both TF and PT have model zoos in official packages too. Liketorchvision.models
or whatever.