ok531441 t1_iwbgwt2 wrote on November 14, 2022 at 12:11 PM

#525,360

Try without any fine tuning, use the pretrained network as a preprocessing step.
Try a different/newer model.

Tiny-Mud6713 OP t1_iwbhh7f wrote on November 14, 2022 at 12:17 PM

#525,380

I have been trying all of the Keras API transfer models but no luck, any suggestions on a newer model, I know the models will behave according to the problem but I'm ready to test anything rn, also any tips on the FC architecture?

The-Last-Lion-Turtle t1_iwbi5jv wrote on November 14, 2022 at 12:24 PM

#525,408

It doesn't look like there are any convolutions in that net. Fully connected layers don't work that well.

Resnet or wide resnet would be a better idea.

Tiny-Mud6713 OP t1_iwbibsp wrote on November 14, 2022 at 12:26 PM

#525,419

Replying to The-Last-Lion-Turtle (#525,408)

The DenseNet201(functional) layer is the full CNN but it's collapsed because it's>700 layer, will try those, thank you

ok531441 t1_iwbiz7c wrote on November 14, 2022 at 12:33 PM

#525,458

Replying to Tiny-Mud6713 (#525,380)

What about point 1? Did you try keeping the pretrained model frozen?

Intelligent-Aioli-43 t1_iwbj5uz wrote on November 14, 2022 at 12:35 PM

#525,469

Try torch lightning

Tiny-Mud6713 OP t1_iwbje65 wrote on November 14, 2022 at 12:38 PM

#525,485

Replying to Intelligent-Aioli-43 (#525,469)

The library?

Intelligent-Aioli-43 t1_iwbji6t wrote on November 14, 2022 at 12:39 PM

#525,490

Yes, I had a similar problem, my model was underperforming on keras API notebooks, I switched to pytorch lightning and it works

Tiny-Mud6713 OP t1_iwbjjp1 wrote on November 14, 2022 at 12:39 PM

#525,493

Replying to ok531441 (#525,458)

Yes that's the first step I do, after that step I try to unfreeze and fine tune

Tiny-Mud6713 OP t1_iwbkf15 wrote on November 14, 2022 at 12:48 PM

#525,539

Replying to Intelligent-Aioli-43 (#525,490)

Never worked with lightning, may sound dumb but, how does changing the library change the output of the learning process?

ItalianPizza91 t1_iwblluq wrote on November 14, 2022 at 1:00 PM

#525,602

If the training loss decreases and validation loss stays the same, this is usually a sign of overfitting. The usual steps I take to avoid this:

- ~~use a dropout layer~~

- add data augmentations

- get more data

sbduke10 t1_iwbmm7c wrote on November 14, 2022 at 1:09 PM

#525,648

In addition to the data augmentation recommendation someone else made, make sure your test set is representative. If you just used the last 10% of images they might be all one class depending on ordering.

Hobit104 t1_iwbpf94 wrote on November 14, 2022 at 1:35 PM

#525,797

Replying to Tiny-Mud6713 (#525,539)

No

ID4gotten t1_iwbpnjv wrote on November 14, 2022 at 1:37 PM

#525,812

Perhaps dig deeper on activation functions, optimization algorithm, or step sizes. Try some alternatives.
If your domain images (and things that differentiate between classes) are very different than those in the pretrained network maybe it doesn't have the features you need.

partyyy_ t1_iwbpvvd wrote on November 14, 2022 at 1:39 PM

#525,822

Replying to sbduke10 (#525,648)

+1, google "stratified sampling"

FakeOuter t1_iwbsmnp wrote on November 14, 2022 at 2:02 PM

#525,972

- try triplet loss

- swap Flatten with GlobalMaxPooling2D layer, it will reduce trainable params 49x in your case. Less params -> lower chance of overfitting. Maybe place some normalization layer right after maxPool

snaykey t1_iwbw78d wrote on November 14, 2022 at 2:31 PM

#526,205

Replying to Intelligent-Aioli-43 (#525,490)

So you kept the exact same model structure, but switched the library and "it works"? I have absolutely no clue what I just read and I'm honestly not sure if I even wanna know

RoaRene317 t1_iwc102m wrote on November 14, 2022 at 3:06 PM

#526,490

There is some tricks that could increase your accuracy:

Adjust the parameter tuning more. Sometimes increasing or decreasing the Augmentation Value (Rotation, zoom , shear,etc) could improve accuracy
Unfreeze Batch Normalization Layers could help
Check for each class of the dataset if it's balanced or not. If the datasets is inbalanced, try to use ViT or SMOTE algorithm.
Increasing Dropout value , sometimes could help
After doing transfer learning, add pooling layers , either MaxPool or Average Pool.

Also I don't think you should dropout some networks before the Dense. Because Flatten is just making it the matrices into one dimensional (Flatten out).

Tiny-Mud6713 OP t1_iwc1vq9 wrote on November 14, 2022 at 3:13 PM

#526,537

Replying to ItalianPizza91 (#525,602)

Yeah the problem is that this is a challenge and the data is limited, tried data augmentation but haven't had much luck.

However, I must ask, when using data augmentation is it better to augment the training and the validation sets or just the training?, seen conflicted opinions online.

Tiny-Mud6713 OP t1_iwc207i wrote on November 14, 2022 at 3:14 PM

#526,547

Replying to sbduke10 (#525,648)

It's a challenge the test is online on unseen data, and I'm shuffling the split data each run

Tiny-Mud6713 OP t1_iwc219b wrote on November 14, 2022 at 3:14 PM

#526,550

Replying to FakeOuter (#525,972)

Will try that thanks

Ragdoll_X_Furry t1_iwc23i9 wrote on November 14, 2022 at 3:14 PM

#526,553

A few more details about your implementation would be useful for us to help you.

How many images are you using for validation?
What batch size and optimizer are you using during training?
What's the dropout rate in the Dropout layers?
How are you preprocessing the images before feeding them to your model? Are you using the tf.keras.applications.densenet.preprocess_input function as suggested in the Keras documentation?

You should try increasing the batch size if you can, and use data augmentation as others have already suggested.

You can also try other networks besides DenseNet, like one of the ResNet or EfficientNet models, and you can replace the Flatten layer by a GlobalAvgPool2D or GlobalMaxPool2D layer to reduce parameter size (in my experience the former gives better results). Also that resizing layer might not necessary to improve accuracy.

GullibleBrick7669 t1_iwc3fyj wrote on November 14, 2022 at 3:24 PM

#526,644

Replying to Tiny-Mud6713 (#526,537)

From my understanding and performance on a recent work of mine (similar problem), augmenting just the training data is beneficial in interpreting the validation accuracy. In the sense, validation data quite literally functions as the test data with no alterations. So, when you plot the loss on training and validation, that should give you an understanding of how well the model will perform on the test data. So, for my problem I augmented just the training data and left validation and test data as is.

Also looking at your plots, it could also be a sign of unrepresentative validation data set. Ensure that there are enough data samples for each class if you find that they are not, try performing the same augmentations that you do on the training data on the validation data as well to generate more samples.

Nhabls t1_iwc3gcr wrote on November 14, 2022 at 3:24 PM

#526,645

What is the representation of each class? A class imbalance could create this exact behavior. You dont even need to use a data augmentation technique ( i don't have a particularly great opinion of them, personally) and just scale the weights appropriately instead.

Also what does "Standard" mean here?

Nhabls t1_iwc3rap wrote on November 14, 2022 at 3:26 PM

#526,664

Replying to Tiny-Mud6713 (#526,537)

You don't augment validation data, you'd be corrupting your validation scores, you'd only augment it at the end when/if you're training with all the data

Speaking of, look at your class representation %s, accuracy might be completely misleading if you have 1 or 2 overwhelmingly represented classes

Tiny-Mud6713 OP t1_iwc7y2g wrote on November 14, 2022 at 3:55 PM

#526,918

Replying to RoaRene317 (#526,490)

Very insightful, Haven't tried most of these things, thanks for sharing the knowledge.

Tiny-Mud6713 OP t1_iwc919l wrote on November 14, 2022 at 4:02 PM

#526,978

Replying to Nhabls (#526,664)

7 classes are equally distributed (500 images), only 1 has like 25% of the other data share (150-ish), it is a problem but I'm not sure how to solve it considering the fact that it's a challenge and I can't add data, augmentation will keep the imbalance since it augments everything equally.

Technical-Owl-6919 t1_iwcakcv wrote on November 14, 2022 at 4:13 PM

#527,078

One thing I don't know why anyone has not mentioned yet is, why have you kept two linear layers ?. Two linear layers one after the other in a Transfer learning case is something which will lead to very bad generalization. DenseNet is large enough to extract features and make them simple enough for single layers to understand. Try removing the dense layer between the output and functional(DenseNet). Also try swapping the Flatten with Global Max or Global Average Pooling.

Nhabls t1_iwcdek4 wrote on November 14, 2022 at 4:32 PM

#527,264

Replying to Tiny-Mud6713 (#526,978)

The data doesn't seem that imbalanced, not to cause the issues you're having. And idk what you are using for augmentation but you can def augment classes to specifically solve imbalance ( i don't like doing that personally). My next guess would be looking at how you're splitting the data for train/val and/or freezing the vast majority of the pretrained model and maybe even just training on the last layer or 2 that you add on top.

Regardless, it's something that's useful to know (very frequent in real world datasets) here's a link that goes over how to weigh classes for such cases it's with tensorflow in mind but it's the same concept regardless

czhu12 t1_iwcejk1 wrote on November 14, 2022 at 4:40 PM

#527,347

Any chance this is just a hard problem? I always try to just manually sift through my dataset to see if i can correctly manually predict the data. Thats usually the benchmark I expect the ML model to be able to achieve.

Tiny-Mud6713 OP t1_iwcfxo2 wrote on November 14, 2022 at 4:49 PM

#527,425

Replying to Technical-Owl-6919 (#527,078)

I have tried that at first since it was intuitive and a benchmark since it's less parameters, but two layers gave better results, also the GAP has caused the training to early stop very early on, what do you suggest to as the top layer, eg GAP, batchnorm, dense

Tiny-Mud6713 OP t1_iwcglrg wrote on November 14, 2022 at 4:54 PM

#527,456

Replying to Ragdoll_X_Furry (#526,553)

1- I'm doing a 20% split, so in total they're around 2800, 700 training and validation.

2- batches of 8, Adam with LR=0.001 in the transfer part, LR=0.0001 in the fine tuning, any other combination caused everything to crumble.

3- currently 0.3, 0.5 caused some early stopping problems, since the model was stuck

4- valid_data_gen = ImageDataGenerator(rescale=1/255.)

train_data_gen = ImageDataGenerator(

rescale=1/255.,

rotation_range = 30,

width_shift_range = 0.2,

height_shift_range = 0.2,

horizontal_flip = True,

vertical_flip = True

)

and then flow from file to get the preprocessed images

Tiny-Mud6713 OP t1_iwcgqff wrote on November 14, 2022 at 4:54 PM

#527,464

Replying to Ragdoll_X_Furry (#526,553)

Actually the resizing really boosted the performance by like 5%, I'm at at 80% now, but still looking to boost it up

Tiny-Mud6713 OP t1_iwcgyg7 wrote on November 14, 2022 at 4:56 PM

#527,478

Replying to czhu12 (#527,347)

Hahhaha, definitely! The pictures are of leaves of 8 different species and they're square 96-pixel images, so not so great to visually look at

Technical-Owl-6919 t1_iwch8sv wrote on November 14, 2022 at 4:58 PM

#527,498

Replying to Tiny-Mud6713 (#527,425)

See, from my experience I would ask you to use EfficientNets in the first place. Secondly please don't unfreeze the model at the very beginning. Train the frozen model with your custom head for a few epochs and when the loss saturates, reduce the Lr and unfreeze the entire network and train again. Btw did you try LR Scheduling ?

shot_a_man_in_reno t1_iwcil08 wrote on November 14, 2022 at 5:07 PM

#527,590

Maybe I'm misunderstanding, but is the DenseNet itself frozen? You're only training the one, massive, fully connected layer?

Tiny-Mud6713 OP t1_iwcjcrv wrote on November 14, 2022 at 5:12 PM

#527,641

Replying to Technical-Owl-6919 (#527,498)

In the post I said I unfroze the CNN layers, I meant after the transfer learning part. I run it untill it early stops with all CNN layers frozen, then run it with unfreezing the top 200 layers or so.

I'm obliged to work on Keras K don't know if it has an LR sched method, I'll check the API great advice.

Tiny-Mud6713 OP t1_iwcjsgx wrote on November 14, 2022 at 5:15 PM

#527,669

Replying to shot_a_man_in_reno (#527,590)

Oh no, from the comments i realized that I have explained things in a bad way, I train the FC layer until it early stops while the DenseNet is frozen, then I take that model and retrain the weights with unfreezing 200-ish layers and lowering the learning rate

Tiny-Mud6713 OP t1_iwck8e1 wrote on November 14, 2022 at 5:18 PM

#527,708

Replying to Technical-Owl-6919 (#527,498)

The problem with efficient nets is that I ran a test on some models apriori, I got this graph, note that the dataset was ran for 3 epochs only each model.

https://drive.google.com/file/d/1OyXaWg6vMirYeI9zLSeGJ2v_qCz3msu4/view?usp=share_link

Technical-Owl-6919 t1_iwckvp7 wrote on November 14, 2022 at 5:22 PM

#527,746

Replying to Tiny-Mud6713 (#527,708)

Something seems to be wrong, the validation scores should not be so low. Exactly what type of data are you dealing with ?

Tiny-Mud6713 OP t1_iwcleya wrote on November 14, 2022 at 5:25 PM

#527,774

Replying to Technical-Owl-6919 (#527,746)

They're pictures of some plant, 8 classes for 8 different species of the same type of the plant.

Technical-Owl-6919 t1_iwclyq8 wrote on November 14, 2022 at 5:29 PM

#527,809

Replying to Tiny-Mud6713 (#527,774)

So my friend, then you have to train the network from scratch, it is getting trapped into a local minima. Maybe a small network might help. Try training a ResNet15 or something similar from scratch. This has happened with me once, I was working with Simulation Images and could not get the AuC score to go above 0.92, once I trained it from scratch I got AUC scores close to 0.99, 0.98 etc.

arg_max t1_iwcn3y0 wrote on November 14, 2022 at 5:36 PM

#527,879

Replying to Tiny-Mud6713 (#527,774)

Imagenet 1k pretraining might not be the best for this as it contains few plant classes. The bigger in-21k has a much larger selection of plants and might be better suited for you. Timm has efficient net v2, beit, vit and convnext models pretrained on this though I don't use keras you might be able to find them for this framework.

Tiny-Mud6713 OP t1_iwcngla wrote on November 14, 2022 at 5:39 PM

#527,902

Replying to Technical-Owl-6919 (#527,809)

So I import the model and unfreeze it immediately and just add my top layers ?

Technical-Owl-6919 t1_iwco8xr wrote on November 14, 2022 at 5:44 PM

#527,952

Replying to Tiny-Mud6713 (#527,902)

Yes and train them, everything is unfrozen.

lambdasintheoutfield t1_iwcov7x wrote on November 14, 2022 at 5:48 PM

#527,984

Here are some tricks that have worked for me in a similar enough use case:

use triplet loss + weighted cross entropy loss (possibly with a weighing to the triplet loss term itself.

I definitely found that carefully considering the objective function has the most influence on performance on problems like this.

try a cyclic learning rate schedule - here, you aren’t necessarily trying to get best results off the bat. You can however study the train and validation loss plots to see how learning rate at different epochs impacts your results.
data augmentation - try as many kinds as you see reasonable

DenseNet reuses the feature maps for every layer in each subsequent layer, and that can help guide how you tweak your algorithm further

Good luck!

Ragdoll_X_Furry t1_iwcxiv6 wrote on November 14, 2022 at 6:44 PM

#528,417

Replying to Tiny-Mud6713 (#527,456)

Adam is usually more likely to overfit, so using SGD with Nesterov momentum might help a bit. I'd also recommend augmenting contrast, brightness, saturation and hue if those options are available for the ImageDataGenerator class.

Also does the rotation in the ImageDataGenerator fill the background with black pixels or is there the option to extend/reflect the image? In my experience simply filling the background with black after rotation tends to hinder the accuracy.

One trick that might also help is to extract the outputs not only from the last layer of the pretrained network but also from earlier layers to feed into your network. In my experience this can help improve the accuracy. I've done this with the EfficientNet B0, so I've pasted some example code here to help you out, though if you don't want to use an EfficientNet I'm sure this can be adapted to the DenseNet201 too.

Of course, sometimes transfer learning just doesn't help really, so if nothing else helps you push the accuracy above 90% it might be best to just build and train your own model from scratch to better suit your needs.

Tiny-Mud6713 OP t1_iwd0cqz wrote on November 14, 2022 at 7:02 PM

#528,568

Replying to Ragdoll_X_Furry (#528,417)

I haven't tried playing with the optimizer, thank you for the notice, also thanks for the code, will try to play around with it too :)

Comments