Viewing a single comment thread. View all comments

currentscurrents t1_j4jj1l6 wrote

It's a little discouraging when every interesting paper has a cluster of 64 A100s in their methods section.

6

junetwentyfirst2020 t1_j4jkejb wrote

The first image transformer is pretty clear that it works better at scale. You might not need a transformer for interesting work though.

You can do so much with that GPU. I think transformers are heavier models, but my background is on CNNs and those work fine on your GPU.

2