junetwentyfirst2020 t1_j4jgu4t wrote on January 16, 2023 at 3:35 AM

I’m not sure why you think that that’s such a crummy graphics card. I’ve trained a lot of interesting things for grad school and even in the work place on 4GB less. If you’re fine tuning then it’s not really going to take that long to get decent results, and 16 GB is not bad.

currentscurrents t1_j4jj1l6 wrote on January 16, 2023 at 3:51 AM

It's a little discouraging when every interesting paper has a cluster of 64 A100s in their methods section.

junetwentyfirst2020 t1_j4jkejb wrote on January 16, 2023 at 4:01 AM

The first image transformer is pretty clear that it works better at scale. You might not need a transformer for interesting work though.

You can do so much with that GPU. I think transformers are heavier models, but my background is on CNNs and those work fine on your GPU.