I totally agree that DDPM would be simpler to implement (I never did this but it seemed more straightforward). But my impression was that score matching was more theoretically grounded than DDPM. The derivation of the objective function of score matching, from the gradient-ascent like Langevin dynamics, to the various approximations to the “target term” in the norm (like grad_x log q(x_tilde|x)) to me feels more sound than the DDPM, which to me felt like arbitrarily saying let’s add noise and attenuate, and assume the reverse is also Gaussian like, and let’s just use a model to learn it, and actually we found that learning the noise is better than learning the mean, for whatever reason so let’s do that.
(I don’t mean to belittle the authors’ work. I have never been able to derive and conduct such research)
I do admit that my impression may be due to the fact that I didn’t understand the derivation of those approximations in score matching. And it’s highly likely I don’t know what I’m talking about regarding DDPM.
Would you please give an example where SBM is intuitive and observation based? I think the first paper, where they discussed a bunch of pitfalls and then came up with using various noise levels and the noise conditioned model seems that way.
WallabyDue2778 OP t1_it4zvey wrote
Reply to comment by dasayan05 in [D] DDPM vs Score Matching by WallabyDue2778
Thank you for your reply.
I totally agree that DDPM would be simpler to implement (I never did this but it seemed more straightforward). But my impression was that score matching was more theoretically grounded than DDPM. The derivation of the objective function of score matching, from the gradient-ascent like Langevin dynamics, to the various approximations to the “target term” in the norm (like grad_x log q(x_tilde|x)) to me feels more sound than the DDPM, which to me felt like arbitrarily saying let’s add noise and attenuate, and assume the reverse is also Gaussian like, and let’s just use a model to learn it, and actually we found that learning the noise is better than learning the mean, for whatever reason so let’s do that.
(I don’t mean to belittle the authors’ work. I have never been able to derive and conduct such research)
I do admit that my impression may be due to the fact that I didn’t understand the derivation of those approximations in score matching. And it’s highly likely I don’t know what I’m talking about regarding DDPM.
Would you please give an example where SBM is intuitive and observation based? I think the first paper, where they discussed a bunch of pitfalls and then came up with using various noise levels and the noise conditioned model seems that way.