Hi all,

I am trying to see if I can use DDPM (Denoising Diffusion Probabilistic Model) to denoise images using a supervised learning approach. However, I've learned that DDPM is only for unconditional image generation. Has anyone had experience using conditional DDPM and could help me out with some conceptual questions?

Here's what I'm trying to understand:

Say I have a pair of noisy and clean ground truth images.
Should I take my clean image and gradually corrupt it by adding gaussian noise in the forward diffusion (FD) process?
Could I get the network to learn the reverse diffusion process by giving it the noisy input, the FD noisy image, and positional embeddings? I was planning on concatenating the noisy input with the FD noisy image.
During training, the network learns to predict noise at t-1 given the image at t conditioned on the input noisy source image.

Here is an image showing you what I mean. Any thoughts or suggestions would be greatly appreciated. DDPM for image denoising

Comments

You must log in or register to comment.

samb-t t1_j50gpn4 wrote on January 19, 2023 at 3:24 PM

I think what you're looking for is palette which is for paired image-to-image translation with conditional diffusion models. I believe that approach is exactly what you're describing, concatenating down the channels dimension.

pilooch t1_j59g48g wrote on January 21, 2023 at 10:05 AM

Absolutely, I do second this, Palette is what you are looking for. We have a modified version in JoliGAN, with PR for various conditioning, including masks and sketches, cf https://github.com/jolibrain/joliGAN/pull/339

Palette-like DDPM works exceptionnally well (we have industrial-grade use cases), but a paired dataset is required, that's the number one drawback I see atm. My understanding is that unpaired diffusion but for at least a single work (UNIT-DDPM) without a known public implementation remains a research field.

Naive-Progress4549 t1_j507af0 wrote on January 19, 2023 at 2:19 PM

I think that if you go in the guided_diffusion repository you can see that the super resolution network condition the output by concatenating the low resolution image. There are also other ways to condition, like the gradients during sampling.

I am trying to adapt the guided_diffusion repository for some other task since a couple of months now...I have to say I am facing quite some difficulties overall!

I hope this helps

CurrentlyJoblessFML OP t1_j508inw wrote on January 19, 2023 at 2:28 PM

Hi! Thanks for the response. I’ll try my luck by just concatenating my noisy input with yt along the channel dimension and see if that works. In the SR3 paper, the authors also mention that they tried using a different way to condition the model but they found that simply concatenating it gave them the same generation quality so they just stuck with that.

Good luck with your project and HMU if you ever want to discuss this. I’ve been breaking my head on these diffusion models for the past couple of days so I feel your struggle.

Hobohome t1_j54l1jy wrote on January 20, 2023 at 10:23 AM

While it may not be exactly what you are looking for, but Deep Image Priors work similarly and have been around for a while.

LanverYT t1_j501vdn wrote on January 19, 2023 at 1:38 PM

That's a really interesting question, and I've been wondering about the same thing. I've never been able to figure it out, but I would love to see what others have to say about it. It sounds like you have a solid approach and understanding of the concept, so I'm curious to see how it turns out. Good luck with your experimentation and let us know how it goes