Viewing a single comment thread. View all comments

UnusualClimberBear t1_jbngux4 wrote

Training from scratch required 2048 A100 for 21 days. And it seems only to be the final run.

I guess you can start to fine-tune it with much lower resources, 16 A100 seems reasonable as going lower will require quantization or partial loadings for the model.

7

potatoandleeks t1_jbnl6se wrote

Wow, they cost $15k a piece. So that's $30 million just for the GPUs! But since you only need them for 21 days, can probably sell them later on craigslist

6

SomewhereAtWork t1_jcf9g5p wrote

Would it be possible to train a quantzised model?

1