ruphan t1_j4winzi wrote on January 18, 2023 at 7:20 PM

It is definitely possible. Let me give an analogy first. In the context of education, let's assume our pretrained model is a person with multiple STEM degrees in fields like neuroscience, math etc.. And let your model that's trained from scratch be someone with no degree yet. We have a limited amount of resources like a couple of textbooks on deep learning. It's intuitive that the first person should not only be able to pick up deep learning faster but also be better than the latter, given that they have a better understanding of the fundamentals and experience.

To extend this analogy to your case, I believe that the pretrained model must be quite big for the limited amount of new data that you have. The pretrained model would have developed a better set of filters that just couldn't be learned with a relatively small dataset for a big model trained form scratch. This is just like the analogy where it doesn't matter if neuroscience and math are not exactly deep learning, having the fundamentals strong by pretraining on millions of images makes that model achieve better accuracy.

Maybe if you have a bigger fine-tuning dataset, this gap in accuracy should diminish eventually.