Ne_Nel t1_j6va0z6 wrote on February 2, 2023 at 3:09 AM

Pixel vs Latent.

Mefaso t1_j6vdzji wrote on February 2, 2023 at 3:40 AM

Exactly, the entire point of Latent Diffusion Models was to make them smaller and faster

uhules t1_j6wrx63 wrote on February 2, 2023 at 1:16 PM

Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.

Mefaso t1_j6z6zgt wrote on February 2, 2023 at 10:43 PM

>DALL-E 2 also applies diffusion in latent space

Not really in the important part. Dalle2 uses diffusion in clip-"latent"-space and then conditions the pixel-diffusion model on the result.

However they still do a full diffusion pass in pixel-space, which is more complex than doing it in latent space, as LDMs do.