MazenAmria OP t1_izt68w9 wrote on December 11, 2022 at 5:51 PM

Reply to comment by suflaj in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

I'm using with torch.no_grad(): when calculating the output of the teacher model.

MazenAmria OP t1_izrgnco wrote on December 11, 2022 at 7:51 AM

Reply to comment by pr0d_ in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

I remember reading it, I'll read it again and discuss it. Thanks.

MazenAmria OP t1_izrgk9j wrote on December 11, 2022 at 7:50 AM

Reply to comment by suflaj in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

I'm already using a pretrained model as the teacher model. But the distillation part itself has nearly the cost of training a model. I'm not insisting but I feel like I'm doing something wrong and needed some advices (note that I've only had theoritical experience in such areas of research, this is the first time I'm doing it practically).

Thanks for you comments. gif

MazenAmria OP t1_izpii1s wrote on December 10, 2022 at 9:39 PM

Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

To examine SWIN itself whether it's overparameterized or not.

MazenAmria OP t1_izonquh wrote on December 10, 2022 at 6:08 PM

Reply to comment by suflaj in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

That's sad; I'm starting to believe that this research idea is impractical or, maybe more accurately, overly ambitious.

MazenAmria OP t1_izon556 wrote on December 10, 2022 at 6:03 PM

Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

> I would expect it to perform more similarly to the full SWIN model on citar-10 because less data complexity.

And that's the problem. If I got say 98% accuracy on CIFAR-10 using SWIN-Tiny and then got the same 98% with a smaller model then I'm not proving anything. There are many simple models that can get 98% on CIFAR-10 so what improvement did I introduce to the SWIN-Tiny? But doing the same thing with ImageNet would be different.