Submitted by ButterscotchLost421 t3_yvmuuc in MachineLearning
ButterscotchLost421 OP t1_iwggwv8 wrote
Reply to comment by yanivbl in [D] How long should it take to train a diffusion model on CIFAR-10? by ButterscotchLost421
Thank you! What do you mean by ADM? Adam?
When training in parallel, which technique did they use? Calculate the gradient of a batch of size `N` on each of the devices and then synchronizing all the different devices to get the mean gradient?
yanivbl t1_iwgjnht wrote
No, not Adam, I was referring to the model from the diffusion beats Gans paper.
I never trained such model, just read it. But yeah it's most likely what you said (a.k.a data parallelism)
Viewing a single comment thread. View all comments