Viewing a single comment thread. View all comments

Ne_Nel t1_j6va0z6 wrote

Pixel vs Latent.

17

Mefaso t1_j6vdzji wrote

Exactly, the entire point of Latent Diffusion Models was to make them smaller and faster

8

uhules t1_j6wrx63 wrote

Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.

4

Mefaso t1_j6z6zgt wrote

>DALL-E 2 also applies diffusion in latent space

Not really in the important part. Dalle2 uses diffusion in clip-"latent"-space and then conditions the pixel-diffusion model on the result.

However they still do a full diffusion pass in pixel-space, which is more complex than doing it in latent space, as LDMs do.

1