Submitted by WallabyDue2778 t3_y92tln in MachineLearning
Red-Portal t1_it5ciyb wrote
Reply to comment by UncleVesem1r in [D] DDPM vs Score Matching by WallabyDue2778
DDPM doesn't aim to produce anything related to Langevin sampling. However, it's objective function is equivalent to the KL divergence between the "true" Gaussian and the neural network parameterized Gaussian. Thus, as long as SGD does optimize the DDPM objective, you'll get something that is close to the true Gaussian according to the KL divergence. The problem is that learning the noise with MSE kindda ruins all of this...
UncleVesem1r t1_it5rffe wrote
I see! I understand why DDPM is good now. I should go back to the paper and pay more attention to the KL divergence part of it.
If I could borrow a few more minutes of your time, could you explain more about what's not as good about score matching?
So to be explicit, my understanding Langevin sampling is correct, i.e., if there's a model that can accurately model the score function, one should be able to recover the true data distribution. If this is true, then I guess the criticism regarding SM is about its objective function, i.e., there's no guarantee that it leads to accurate score function? But aren't the score matching algorithms (denoising, projection) supposed to be able to solve the objective function involving grad_x log p(x)?
Or perhaps Langevin sampling is the problem. The paper does say that with small enough noise and enough steps, we would end up in an exact sample from the data set. Yet if we don't have small enough noise and enough steps, perhaps we end up somewhere but it doesn't guarantee to be the true data distribution?
I really appreciate this! Thanks again.
Red-Portal t1_it5v31k wrote
>there's no guarantee that it leads to accurate score function? But aren't the score matching algorithms (denoising, projection) supposed to be able to solve the objective function involving grad_x log p(x)?
Oh no it's not. All it's doing is to minimize the mean-squares error against the score function. Minimizing this objective does not mean sampling using this score function will be a good idea; which it isn't. This is exactly why score modelling has to rely on adding noise. And by doing this, they converged to DDPM.
UncleVesem1r t1_it5wdui wrote
Very cool! I think the pitfalls mentioned in the SM paper also make more sense now.
Thank you kind sir/madam
Viewing a single comment thread. View all comments