Submitted by ButterscotchLost421 t3_yvmuuc in MachineLearning
samb-t t1_iwguscp wrote
Have you got the 1.3M number from the config file (config.training.n_iters = 1300001), if so that's the number of training steps not epochs! So hopefully more like around 7 hours to train on an A100, thank god!
ButterscotchLost421 OP t1_iwgwqq0 wrote
Ah yes, you're right! Thank you so much!
Does 7 secs per epoch sound approximately right to you?
samb-t t1_iwgz27t wrote
7 secs sounds very fast but if you're not using a massive model, it's on cifar, and on an A100 it's not implausible, but you might want to double check so you're sure
Viewing a single comment thread. View all comments