lambdasintheoutfield t1_iwcov7x wrote on November 14, 2022 at 5:48 PM

Here are some tricks that have worked for me in a similar enough use case:

use triplet loss + weighted cross entropy loss (possibly with a weighing to the triplet loss term itself.

I definitely found that carefully considering the objective function has the most influence on performance on problems like this.

try a cyclic learning rate schedule - here, you aren’t necessarily trying to get best results off the bat. You can however study the train and validation loss plots to see how learning rate at different epochs impacts your results.
data augmentation - try as many kinds as you see reasonable

DenseNet reuses the feature maps for every layer in each subsequent layer, and that can help guide how you tweak your algorithm further

Good luck!