Sbadabam278 t1_iseer5o wrote on October 15, 2022 at 10:07 AM

Reply to comment by C0hentheBarbarian in [D] Simple Questions Thread by AutoModerator

Thank you for the resources, it is a nice explanation! However, I was looking for more of a technical understanding - which topics I should read in order to follow and understand the original paper?

Sbadabam278 t1_is75ja9 wrote on October 13, 2022 at 8:02 PM

Reply to [D] Simple Questions Thread by AutoModerator

How can I learn the theory behind diffusion models (and stable diffusion) properly?

I have read the papers, but to me they gloss over a huge amount of information and are hard to make sense of at the moment.

Let’s take the original diffusion paper “deep unsupervised learning using non equilibrium thermodynamics “

They start with a data point x0 and then apply a “markov diffusion kernel” (aka adding a zero mean Gaussian random variable) for T times until we converge to a fixed distribution (also normal). Then they want to learn a “reverse distribution” p that inverts the process, by learning mean and variance for the reverse process distribution at each step.

So first of all, we already know mean and variance of each step. Why are you trying to estimate them? Are we trying to find “fake” mean and variance which push the stable state towards the “manifold” of realistic looking data points? If so, some other things in the paper don’t make sense to me (things like “the forward and reversal process are identical if the variance is small” - wtf are you talking about)

Another point is: what is the significance of this process in the first place? The forward process is mathematically equivalent to just add a single Gaussian random variable with higher variance. Why is having many steps important, and why can’t we learn to demonize directly from the final state in a single step?

There are many more questions I have about the paper, so my main question is: how do people make sense of it? I’m having a hard time even finding out which topics I should research.

I’m not an expert in probability / markov chains / math in general, but I think I can say I’m not a complete newbie either. What is the expected background one should have to read and understand these articles, and do you have any pointers on how to do that?

Thanks!