Submitted by [deleted] t3_xu4rxp in MachineLearning
golljj t1_ir4few1 wrote
What I think is that you should treat the DDIM as an auto-encoder, i.e., the encoder encoding the clean image $x_0$ to $x_t$ by the forward process, and the revers process is a decoder that decoding the $x_t$ to $\hat{x_0}$, and notice that the noise $x_t$ is not a pure gaussian noise, but the image $x_0$ added some gaussian noise.
dasayan05 t1_ir4kcfg wrote
Sorry, but that's not really the correct interpretation. The "forward process" is not the encoder -- it's a stochastic process. The encoder is the "reverse of eq.14", which is integrating the ODE in eq.14 backwards in time -- that is not same as the "forward process".
golljj t1_ir930ya wrote
Do you mean that the forward process is what only adding pure gaussian noise with different $\beta$ to $x_0$ to obtain $x_T$, but the encoder here is adding the noise predicted by the estimator $\epsilon_{\theta}(x_t)$ to $x_t$?
dasayan05 t1_ir9d9s3 wrote
Your first part of the statement is correct -- that is called the "forward process" and it is only needed at training time.
Yes, the encoder in DDIM is basically adding a predicted-noise to travel back to x_T -- it's more like the "reverse of the reverse process", but we can't really call it the "forward process", can we? For example, the true "forward process" is almost entire random and you can skip to any x_t by re-parameterization. This isn't true for DDIM's "reverse of the reverse process" -- it must be sequential and deterministic.
golljj t1_ir9r834 wrote
Got it, thanks
Viewing a single comment thread. View all comments