Submitted by Tiny-Mud6713 t3_yuxamo in MachineLearning
[removed]
Submitted by Tiny-Mud6713 t3_yuxamo in MachineLearning
[removed]
I have been trying all of the Keras API transfer models but no luck, any suggestions on a newer model, I know the models will behave according to the problem but I'm ready to test anything rn, also any tips on the FC architecture?
It doesn't look like there are any convolutions in that net. Fully connected layers don't work that well.
Resnet or wide resnet would be a better idea.
The DenseNet201(functional) layer is the full CNN but it's collapsed because it's>700 layer, will try those, thank you
What about point 1? Did you try keeping the pretrained model frozen?
Try torch lightning
The library?
Yes, I had a similar problem, my model was underperforming on keras API notebooks, I switched to pytorch lightning and it works
Yes that's the first step I do, after that step I try to unfreeze and fine tune
Never worked with lightning, may sound dumb but, how does changing the library change the output of the learning process?
If the training loss decreases and validation loss stays the same, this is usually a sign of overfitting. The usual steps I take to avoid this:
- use a dropout layer
- add data augmentations
- get more data
Perhaps dig deeper on activation functions, optimization algorithm, or step sizes. Try some alternatives.
If your domain images (and things that differentiate between classes) are very different than those in the pretrained network maybe it doesn't have the features you need.
+1, google "stratified sampling"
So you kept the exact same model structure, but switched the library and "it works"? I have absolutely no clue what I just read and I'm honestly not sure if I even wanna know
There is some tricks that could increase your accuracy:
Also I don't think you should dropout some networks before the Dense. Because Flatten is just making it the matrices into one dimensional (Flatten out).
Yeah the problem is that this is a challenge and the data is limited, tried data augmentation but haven't had much luck.
However, I must ask, when using data augmentation is it better to augment the training and the validation sets or just the training?, seen conflicted opinions online.
It's a challenge the test is online on unseen data, and I'm shuffling the split data each run
Will try that thanks
A few more details about your implementation would be useful for us to help you.
How many images are you using for validation?
What batch size and optimizer are you using during training?
What's the dropout rate in the Dropout layers?
How are you preprocessing the images before feeding them to your model? Are you using the tf.keras.applications.densenet.preprocess_input
function as suggested in the Keras documentation?
You should try increasing the batch size if you can, and use data augmentation as others have already suggested.
You can also try other networks besides DenseNet, like one of the ResNet or EfficientNet models, and you can replace the Flatten layer by a GlobalAvgPool2D or GlobalMaxPool2D layer to reduce parameter size (in my experience the former gives better results). Also that resizing layer might not necessary to improve accuracy.
From my understanding and performance on a recent work of mine (similar problem), augmenting just the training data is beneficial in interpreting the validation accuracy. In the sense, validation data quite literally functions as the test data with no alterations. So, when you plot the loss on training and validation, that should give you an understanding of how well the model will perform on the test data. So, for my problem I augmented just the training data and left validation and test data as is.
Also looking at your plots, it could also be a sign of unrepresentative validation data set. Ensure that there are enough data samples for each class if you find that they are not, try performing the same augmentations that you do on the training data on the validation data as well to generate more samples.
What is the representation of each class? A class imbalance could create this exact behavior. You dont even need to use a data augmentation technique ( i don't have a particularly great opinion of them, personally) and just scale the weights appropriately instead.
Also what does "Standard" mean here?
You don't augment validation data, you'd be corrupting your validation scores, you'd only augment it at the end when/if you're training with all the data
Speaking of, look at your class representation %s, accuracy might be completely misleading if you have 1 or 2 overwhelmingly represented classes
Very insightful, Haven't tried most of these things, thanks for sharing the knowledge.
7 classes are equally distributed (500 images), only 1 has like 25% of the other data share (150-ish), it is a problem but I'm not sure how to solve it considering the fact that it's a challenge and I can't add data, augmentation will keep the imbalance since it augments everything equally.
One thing I don't know why anyone has not mentioned yet is, why have you kept two linear layers ?. Two linear layers one after the other in a Transfer learning case is something which will lead to very bad generalization. DenseNet is large enough to extract features and make them simple enough for single layers to understand. Try removing the dense layer between the output and functional(DenseNet). Also try swapping the Flatten with Global Max or Global Average Pooling.
The data doesn't seem that imbalanced, not to cause the issues you're having. And idk what you are using for augmentation but you can def augment classes to specifically solve imbalance ( i don't like doing that personally). My next guess would be looking at how you're splitting the data for train/val and/or freezing the vast majority of the pretrained model and maybe even just training on the last layer or 2 that you add on top.
Regardless, it's something that's useful to know (very frequent in real world datasets) here's a link that goes over how to weigh classes for such cases it's with tensorflow in mind but it's the same concept regardless
I have tried that at first since it was intuitive and a benchmark since it's less parameters, but two layers gave better results, also the GAP has caused the training to early stop very early on, what do you suggest to as the top layer, eg GAP, batchnorm, dense
1- I'm doing a 20% split, so in total they're around 2800, 700 training and validation.
2- batches of 8, Adam with LR=0.001 in the transfer part, LR=0.0001 in the fine tuning, any other combination caused everything to crumble.
3- currently 0.3, 0.5 caused some early stopping problems, since the model was stuck
4- valid_data_gen = ImageDataGenerator(rescale=1/255.)
train_data_gen = ImageDataGenerator(
rescale=1/255.,
rotation_range = 30,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True,
vertical_flip = True
)
​
and then flow from file to get the preprocessed images
Actually the resizing really boosted the performance by like 5%, I'm at at 80% now, but still looking to boost it up
Hahhaha, definitely! The pictures are of leaves of 8 different species and they're square 96-pixel images, so not so great to visually look at
See, from my experience I would ask you to use EfficientNets in the first place. Secondly please don't unfreeze the model at the very beginning. Train the frozen model with your custom head for a few epochs and when the loss saturates, reduce the Lr and unfreeze the entire network and train again. Btw did you try LR Scheduling ?
Maybe I'm misunderstanding, but is the DenseNet itself frozen? You're only training the one, massive, fully connected layer?
In the post I said I unfroze the CNN layers, I meant after the transfer learning part. I run it untill it early stops with all CNN layers frozen, then run it with unfreezing the top 200 layers or so.
I'm obliged to work on Keras K don't know if it has an LR sched method, I'll check the API great advice.
Oh no, from the comments i realized that I have explained things in a bad way, I train the FC layer until it early stops while the DenseNet is frozen, then I take that model and retrain the weights with unfreezing 200-ish layers and lowering the learning rate
The problem with efficient nets is that I ran a test on some models apriori, I got this graph, note that the dataset was ran for 3 epochs only each model.
https://drive.google.com/file/d/1OyXaWg6vMirYeI9zLSeGJ2v_qCz3msu4/view?usp=share_link
Something seems to be wrong, the validation scores should not be so low. Exactly what type of data are you dealing with ?
They're pictures of some plant, 8 classes for 8 different species of the same type of the plant.
So my friend, then you have to train the network from scratch, it is getting trapped into a local minima. Maybe a small network might help. Try training a ResNet15 or something similar from scratch. This has happened with me once, I was working with Simulation Images and could not get the AuC score to go above 0.92, once I trained it from scratch I got AUC scores close to 0.99, 0.98 etc.
Imagenet 1k pretraining might not be the best for this as it contains few plant classes. The bigger in-21k has a much larger selection of plants and might be better suited for you. Timm has efficient net v2, beit, vit and convnext models pretrained on this though I don't use keras you might be able to find them for this framework.
So I import the model and unfreeze it immediately and just add my top layers ?
Yes and train them, everything is unfrozen.
Here are some tricks that have worked for me in a similar enough use case:
I definitely found that carefully considering the objective function has the most influence on performance on problems like this.
try a cyclic learning rate schedule - here, you aren’t necessarily trying to get best results off the bat. You can however study the train and validation loss plots to see how learning rate at different epochs impacts your results.
data augmentation - try as many kinds as you see reasonable
DenseNet reuses the feature maps for every layer in each subsequent layer, and that can help guide how you tweak your algorithm further
Good luck!
Adam is usually more likely to overfit, so using SGD with Nesterov momentum might help a bit. I'd also recommend augmenting contrast, brightness, saturation and hue if those options are available for the ImageDataGenerator class.
Also does the rotation in the ImageDataGenerator fill the background with black pixels or is there the option to extend/reflect the image? In my experience simply filling the background with black after rotation tends to hinder the accuracy.
One trick that might also help is to extract the outputs not only from the last layer of the pretrained network but also from earlier layers to feed into your network. In my experience this can help improve the accuracy. I've done this with the EfficientNet B0, so I've pasted some example code here to help you out, though if you don't want to use an EfficientNet I'm sure this can be adapted to the DenseNet201 too.
Of course, sometimes transfer learning just doesn't help really, so if nothing else helps you push the accuracy above 90% it might be best to just build and train your own model from scratch to better suit your needs.
I haven't tried playing with the optimizer, thank you for the notice, also thanks for the code, will try to play around with it too :)
ok531441 t1_iwbgwt2 wrote
Try without any fine tuning, use the pretrained network as a preprocessing step.
Try a different/newer model.