Viewing a single comment thread. View all comments

maizeq t1_iujf76j wrote

The sampling method used with diffusion/score models is in fact a type of approximate MCMC. As another commentator mentioned, it’s the result of discretising (hence approximate) an SDE that has the log data probability (under the model) as its equilibrium distribution.

The advantage of Langevin sampling methods vs a method like Metropolis-Hastings is better efficiency (lower mixing time), because it reduces random walk behaviour. It also scales better with higher dimensionality.

What made modern diffusion/score based models successful was combining this with a schedule of additive noise, and conditioning the score models on the scale of this noise (the time-step). This solved various problems with the traditional score matching objective (like poor performance in low density regions).

1

Red-Portal t1_iujl4c3 wrote

>It also scales better with higher dimensionality.

I believe this is not true. I am not aware of results that show ULA is faster than MALA. In fact, I believe it's the complete opposite. This paper shows that MALA is much faster than ULA. Did the state of knowledge change recently?

1