I have a project where I need to detect the orientation of machined parts on an assembly line. I have a dataset of hundreds of thousands of correctly labeled images (i.e. "correct orientation", "rotated left", "rotated right", "upside down"). Each image is 1024x768.

I'm relatively new to DL but have done a lot of reading about CNNs. I've come across many articles that discuss the process of hyperparameter tuning. But, I haven't been able to find anything related to creating an initial CNN architecture based upon the type of problem you are trying to solve. I've seen setups as basic as "dogs vs cats" up through implementing VGG16 from scratch.

How do I choose an initial architecture that is appropriate for my problem? TIA

Comments

You must log in or register to comment.

saw79 t1_iz0158r wrote on December 5, 2022 at 1:49 PM

I don't think it makes sense these days to implement a CNN architecture from scratch for a standard problem (e.g., classification), except as a learning exercise. A common set of classification networks that I use as a go-to are the EfficientNet architectures. Usually I use the timm library (for PyTorch), and instantiating the model is just 1 line of code (see its docs). You can either load it in pretrained (from ImageNet) or randomly initialized, and further fine-tune yourself. EfficientNet has versions 0-7 that give increasing performance at the cost of computation/size. If you're in TensorFlow-land I'm sure there's something analogous. Both TF and PT have model zoos in official packages too. Like torchvision.models or whatever.

johnnymo1 t1_iz1bonf wrote on December 5, 2022 at 7:11 PM

TensorFlow-land has tf.keras.applications

mr_birrd t1_iz03cr7 wrote on December 5, 2022 at 2:08 PM

I don't think you would need such a big thing like VGG even (besides, it's inefficient compared to newer architectures). The idea mostly is to extract festures with conv layers and then have an mlp to classify stuff based on the features. An AlexNet like network is the way I always start, just with adapted layer sizes. If it's already enough then no need to slap ResNet on anything.

SufficientStautistic t1_iz1c1yp wrote on December 5, 2022 at 7:13 PM

+1 for using a model template rather than experimenting/crafting things by hand for your problem. Many good general-purpose architectures for classification exist and in my experience they work very well. For the classification problem you describe you will probably be fine using one of the architectures mentioned on the Keras CV page (or the equivalent place in the timm/pytorch docs). Recommend starting from a pretrained model.

The approach I usually take to solving a CV problem is to survey what architectures are recommended for the problem in the abstract (e.g. classification, segmentation, pose estimation etc), try those, then make modifications using details from the specifics of the problem if necessary.

Tbh you might not even need a deep vision model for your problem.

CommunismDoesntWork t1_iz1qf6u wrote on December 5, 2022 at 8:45 PM

You don't build your own network. You define your problem, and choose the best network for that problem. For instance, is your problem a classification problem? Then find an off the shelf classifier to use.

descript_account t1_iyzzou9 wrote on December 5, 2022 at 1:36 PM

There's no such guide. Just trial and error friend.

StrasJam t1_iz15jqm wrote on December 5, 2022 at 6:31 PM

Start with a small model (maybe resnet18) and start experimenting. Often its more worth your time and effort to research tricks in how to train the model (e.g. learning rate schedulers) or inprove the data quality.

Intelligent-Aioli-43 t1_iz3ve35 wrote on December 6, 2022 at 7:10 AM

What kind of computing resources do you have?

kaarrrlll t1_iz3x6x4 wrote on December 6, 2022 at 7:35 AM

Most architectures are released following some painful engineering. So I also +1 others that suggest starting off with an architecture that made its name for classification task

Character-Act-9090 t1_iz3zs81 wrote on December 6, 2022 at 8:12 AM

You don't want to handcraft each layer but rather take a small-medium sized network architecture and apply it to your problem. If you then need extra performance you can easily go for a bigger architecture or fine tune restnet to your problem.

However, depending on the complexity of images and image quality a rather small network should work quite good already. I had a similar project in university and trained a really small network for the task with only a few hundred pictures with an accurracy of over 95%.

Classification tasks are usually the simplest of all and you seem to only have a few number of classes which makes it even easier. You don't need to use an architecture trained to classify poor images into thousands of different classes.

Work your way up and start with a simple example on the pytorch website and work your way up until you are satisfied with performance (Especially if you are a beginner).

Final-Rush759 t1_iz4ib3v wrote on December 6, 2022 at 12:32 PM

Convnext is really good, and computation efficient.