Viewing a single comment thread. View all comments

jiamengial t1_j6mdcrj wrote

Don't think so, diffusion models are based entirely on sampling methods; if anything what's exciting is to take the "traditional" methods and, instead of replacing the whole thing with neural nets, replace only a component of it

−8

arg_max t1_j6mg664 wrote

I think diffusion models are kind of a bad example. The SDE paper from Yang Song has shown that it's all about modeling the score function and this can't be done with simple models. Apart from that, the big text2img models work inside the latent space of a deep vae, make use of conditioning using cross attention which isn't a thing in traditional ML and use large language models to process the text input. All their components are very dl based.

13

kpalan t1_j6n8otw wrote

The main way diffusion uses to predict added noise is with deep convolutional neural networks,
Furthermore, stable diffusion specifically uses even more deep CNNs to downscale,upscale the image.

2