golljj

golljj t1_ir930ya wrote

Do you mean that the forward process is what only adding pure gaussian noise with different $\beta$ to $x_0$ to obtain $x_T$, but the encoder here is adding the noise predicted by the estimator $\epsilon_{\theta}(x_t)$ to $x_t$?

1

golljj t1_ir4few1 wrote

What I think is that you should treat the DDIM as an auto-encoder, i.e., the encoder encoding the clean image $x_0$ to $x_t$ by the forward process, and the revers process is a decoder that decoding the $x_t$ to $\hat{x_0}$, and notice that the noise $x_t$ is not a pure gaussian noise, but the image $x_0$ added some gaussian noise.

1