Submitted by ZeronixSama t3_yieq8c in MachineLearning
I was recently reviewing the diffusion methods used in Stable Diffusion and my mind wandered to Markov Chain Monte Carlo, which got me thinking - are there important theoretical similarities / differences between these methods?
A bit of background:
- Intro to Stable Diffusion: A nice illustrated guide by Jay Alammar https://jalammar.github.io/illustrated-stable-diffusion/
- Intro to MCMC: Stanford CS168 notes by Tim Roughgarden and Gregory Valiant http://timroughgarden.org/s17/l/l14.pdf
- The Metropolis-Hastings (MH) algorithm, a specific MCMC algorithm: https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
My own stream-of-consciousness thoughts:
- Function: Both diffusion and MH are sampling-based generative models that learns to produce data from a given distribution.
- Iterative sampling: Diffusion works by predicting a de-noising term, which is progressively added to a random noise sample. This is similar to the MH proposal function g(x' | x) which generates candidate next state in the Markov chain. For diffusion, it converges to the training distribution, whereas MCMC converges to the stationary distribution.
- Biasing: Diffusion can be conditioned on exact boundary conditions, or 'guided' towards certain types of outputs by modifying the diffusion gradient in the vein of Janner et al. In the same way the MH algorithm can respect hard and soft constraints by configuring the acceptance ratio f(x') / f(x) to encode the desired properties of the stationary distribution
- Overall, we may say that in the literature, diffusion is largely 'learned' whereas MCMC is 'designed', but reversing that scenario may introduce interesting results (in terms of learnable MCMC or designed diffusion).
This thought raises a lot of important questions for me.
- Can we interpret diffusion as some variant of MCMC and therefore derive theoretical properties of it?
- The basic discussion above analyses MH and diffusion in terms of 2 properties, which are their iterative sampling and biasing procedures. Can we 'mix-and-match' to get new algorithms which might be better?
- What are the important theoretical differences between these two methods?
It's not clear to me what the answers are, I'm hoping to have a good discussion with the smart minds in this forum!
CatalyzeX_code_bot t1_iui8kht wrote
Found relevant code at https://diffusion-planning.github.io/ + all code implementations here
--
To opt out from receiving code links, DM me