ButterscotchLost421
ButterscotchLost421 OP t1_iwggwv8 wrote
Reply to comment by yanivbl in [D] How long should it take to train a diffusion model on CIFAR-10? by ButterscotchLost421
Thank you! What do you mean by ADM? Adam?
When training in parallel, which technique did they use? Calculate the gradient of a batch of size `N` on each of the devices and then synchronizing all the different devices to get the mean gradient?
ButterscotchLost421 OP t1_iwgwqq0 wrote
Reply to comment by samb-t in [D] How long should it take to train a diffusion model on CIFAR-10? by ButterscotchLost421
Ah yes, you're right! Thank you so much!
Does 7 secs per epoch sound approximately right to you?