dasayan05

dasayan05 t1_ivpmx7r wrote

There is no way to feasibly compute what you are asking for.

Diffusion Models (in fact any modern generative model) are defined on continuous image-space, i.e. a continuous vector of 512x512 length. This space is not discrete, so there isn't even any notion of "distinct images". A tiny continuous change can lead to another plausible image and there are (theoretically) infinitely many tiny change you can apply on an image to produce another image that looks same but isn't the same point in image space.

The (theoretically) correct answer to your question would be that there are infitiely many images you can sample from a given generative model.

17

dasayan05 t1_it95xq7 wrote

IMO, forward diffusion process isn't really a "process" -- it's need not be sequential, it's parallelizable. The sole purpose of forward process is simulating noisy data from a set of "noisy data distributions" crafted with a known set of noise-scales -- that's it. SBM and DDPM both have this process. For SBMs, it is again a heuristic HP to choose the correct largest scale so that it can overpower the data variance and reach an uninformative prior. For DDPM, it always reaches the prior due to the way noise-scales and attenuation coefficients are computed from \beta_t.

Agree with your second part. SDE formulation is good -- it basically brings SBMs into a more stronger theoretical framework. SDEs offer a reverse process which is analytic where the score naturally appears -- i.e. again not much HP.

1

dasayan05 t1_it6u4ho wrote

To clarify, "score matching" itself is quite theoretically grounded -- what is not, is the fact that score matching and langevin dymanics is not theoretically "coupled". Langevin dynamics is chosen more like an intuitive way of "using" the score-estimates. Moreover, langevin dynamics theretically takes infinite time to reach the true distribution and it's convergence depends on proper choice of `\delta`, a tiny number that acts like step size.

x_{t-1} = x_t + s(x_t, t) \delta / 2 + sqrt{\delta} z

Now, look at DDPM. DDPM's training objective is totally "coupled" with it's sampling process -- it all comes from very standard calculations on the underlying PGM (probabilistic graphical model). Notice that DDPMs reverse process do not involve a hyperparam like `\delta`, everything is tied to the known \beta schedule -- which tells you what exact step size to take in order to converge in finitely many (T) steps. DDPM's reverse process is not langevin dynamics -- it just looks like it, but has stronger gurantee on convergence.

This makes it more robust compared to Score based langevin dynamics.

1

dasayan05 t1_it46pby wrote

>... these seem to be two dominant approaches ...

Totally. There are two streams of ideas, similar but not exactly equivalent, namely Score-Based Models (SBM) and Denoising Diffusion Probabilistic Models (DDPM). There is an effort to unify these two under the umbrella of Stochastic Differential Equations (SDE), where SBM -> "Variance Exploding SDE" and DDPM -> "Variance Preserving SDE". By far, DDPM is more famous -- reason is, DDPM has stronger theoretical gurantees and less hyperparameters. SBMs are, in some parts, intuitive and observation-based.

>.. they learn the noise rather than the score ..

Yes. SBM uses "score" while DDPM uses "noise-estimates"; but they are related -- "score = - eps / noise-std" see CVPR22's Diffusion slides (slide 57). IMO, the major difference between SBM and DDPM is their forward noising process -- SBM only adds noise -- DDPM adds noise as well as attenuates the signal and this process is systematically "tied" to the noise schedule \beta_t. This makes the reverse process look slightly different.

If you want to implement Diffusion Models, start with DDPM as formulated by Ho et al. I have never seen an algorithm written so clearly as the one in Ho et al's Algorithm 1 & 2. It can't get any simpler in terms of implementation.

9

dasayan05 t1_irrfd7w wrote

It doesn't matter which one you implement. Trying to implement anything from scratch always exposes you to deeper insights which is hard to get by looking at dry mathematics on paper. Just one advice: pick a paper/algo that is well-known to work and reproducible. Then you are good.

6

dasayan05 t1_ir9d9s3 wrote

Your first part of the statement is correct -- that is called the "forward process" and it is only needed at training time.

Yes, the encoder in DDIM is basically adding a predicted-noise to travel back to x_T -- it's more like the "reverse of the reverse process", but we can't really call it the "forward process", can we? For example, the true "forward process" is almost entire random and you can skip to any x_t by re-parameterization. This isn't true for DDIM's "reverse of the reverse process" -- it must be sequential and deterministic.

1

dasayan05 t1_ir4kcfg wrote

Sorry, but that's not really the correct interpretation. The "forward process" is not the encoder -- it's a stochastic process. The encoder is the "reverse of eq.14", which is integrating the ODE in eq.14 backwards in time -- that is not same as the "forward process".

1

dasayan05 t1_iqy7x8z wrote

Yes, you get the noise from the U-Net itself.

The idea behind DDIM is to make the reverse process deterministic, i.e. conveting the SDE into an ODE (eq. 14). Now that said, an ODE can be integrated backwards in time starting from final solution (the clean image x_0), integrating with negative `dt`, reaching at noise (i.e. "encoded feature") x_T. Thus, you get a negative sign in front of the nosie-estimator `\epsilon_{\theta}` and then treat it like a normal ODE and integrate from end-time (t=0) to start-time (t=T).

2