Submitted by ButterscotchLost421 t3_yvmuuc in MachineLearning
Hey,
I am currently training a diffusion model on CIFAR.
The network is very similar to the code in the annotated diffusion model blog post (https://huggingface.co/blog/annotated-diffusion).
Checking Yang Songs code for CIFAR 10 ( https://github.com/yang-song/score_sde ), I see that the DM is trained for a staggering amount of 1 300 000 epochs.
One epoch takes 7 seconds on the machine (NVIDIA A100-SXM4-40GB).
Therefore overall training would take 2500 hours, i.e. a hundred days?
What am I doing wrong? Was the model trained on an even better GPU (what kind of scale)? Or should training an epoch of 50k examples take way below 7 seconds? Or did this really train for a hundred days?
yanivbl t1_iwgb683 wrote
They had more GPUs, training in parallel. Not sure about cifar10 but I read the number for ADM with imagenet is ~1000 days for a single V100.