PhysZhongli t1_jccbh4w wrote on March 15, 2023 at 8:28 PM

Hi everyone, I am a novice trying to learn ML and AI. I am trying to train a CNN model to classify 9000+ images with 100 labels. These images are flower patterns/leaves from what I can tell. The catch is that the actual test dataset has 101 labels and the when the model detects an image not in the original 100 labels it has to assign it to the 101st label. What would be the best way to go about doing this?

I have used resnet50 with imagenet weights and made some of the previous layers trainable to fine tune the model. I have followed it with a globalaverage layer, a 1024 node dense layer with l2 regularization, batchnorm, dropout and softmax layer as the classifer. I am using adam optimizer with a batch size of 16, learning rate of 0.0001. I then assign a threshold value of 0.6 and if the model prediction is below the threshold value it assigns it the 101st label. Currently i have a ~90% testing accuracy.

Are there any obvious things i should be doing better/changing and how can i go about optimising the threshold value or is there a better way to handle the 101st label? Should i be using resnet or something else for flower patterns and leaves given my training dataset of 9000+ images

LeN3rd t1_jcgo5ro wrote on March 16, 2023 at 6:19 PM

You should take a look at uncertainty in general. What you are trying to do is calculate epistemic uncertainty. (google epistemic vs aleatoric uncertainty).

One thing that works well is to have a dropout layer, that is active during prediction!! (in tensorflow you have to feed training=True into the call to activate it during prediction). Sample like 100 times and calculate the standard deviation. This gives you a general "i do not know" function from the network. You can also do so by training 20 models and letting them output 20 different results. With this you can assign the 101 label, when the uncertainty is too high.

In my experience you should stay away from bayesian neural networks, since the are extremly hard to train, and cannot model multimodal uncertainty. (dropout can neither, but is WAAAAYYY easier to train).