tysam_and_co OP t1_j6hxgzk wrote on January 30, 2023 at 1:34 PM

Reply to comment by shellyturnwarm in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Hi hi hiya there! Great questions, thanks so much for asking them! :D

For the dataloaders, that dataloading only happens once -- after that, it's just saved on disk as a tensor array in fp16. It's wayyyyy faster for experimentation this way. We only need to load the data once, then we move it to GPU, then we just dynamically slice it on the GPU each time! :D

As for self.se, that used to be a flag for the squeeze_and_excite layers. I think it's redundant now as it's just a default thing -- this is a one person show and I'm moving a lot of parts fast so there's oftentimes little extraneous bits and pieces hanging around. I'll try to clean that up on the next pass, very many thanks for pointing that out and asking!

I'm happy to answer any other questions that you might have! :D