Diffusion models are trained on a image sequences, where in each sequence, the image is progressively corrupted with noise; given image N, add noise to produce N+1.

The diffusion model learns to reverse the corruption by one step; given image N, predict N-1.

Could other forms of corruption be used instead of uniform noise? Examples:

Compression artifacts
Perlin noise
Uniform noise applied inconsistently across the image
Bad camera exposure
Banding due to low bit depth
Gaussian blur
Pixelation, aliasing, or other sampling artifacts
Motion blur
Color transformations
Sequences of corruptions where the choice of degradation is different for each step

Or more complex examples, perhaps for training a model to change the semantics of an image, or repair incoherent outputs:

Patches from other images
Images with incorrect labels blended in
Scrambling image patches with random transformations
Sequences of outputs from GANs produced as its training progressed (but with the same seed)
DeepDream iterations

Or, in general, any distortion with these properties:

Cheap to produce or assemble sequences for
Causes the image to become more out-of-distribution from the uncorrupted image dataset for the given prompt

The motivation for asking is, if there isn't something "special" about noise and any drop-in corruption could be used, a diffusion model could:

Be used as a blind image restoration; make an image "better" by human measure without changing it significantly.
Tweak the content of an image without removing unecessary details with noise; make an image match a prompt with minimal changes.

If there is something "special" about noise (e.g. the model or training procedure makes certain assumptions that depend on noise), what is special about noise, and how can diffusion models be modified to handle more general corruptions?

Thanks!

Comments

feliximo t1_iscxjm1 wrote on October 15, 2022 at 12:23 AM

Already a paper on this, they call it Cold Diffusion.

zergling103 OP t1_isczxzg wrote on October 15, 2022 at 12:43 AM

After quickly skimming through the paper, it appears that they use multiple models, one model per type of image degredation. I was hoping to learn about a single general model that can reverse any squence of degredation. Perhaps it'd have better performance; for example the de-blurring cold diffusion model seems to produce outputs that lack detail.

mrtransisteur t1_isvjfeu wrote on October 19, 2022 at 12:30 AM

Sounds like you've got a research paper topic brewing!

zergling103 OP t1_iszme0k wrote on October 19, 2022 at 9:20 PM

I have all sorts of ideas, but little means to test them ::(

zergling103 OP t1_iscyeca wrote on October 15, 2022 at 12:30 AM

Thanks! I'll check it out

bdubbs09 t1_isdb32q wrote on October 15, 2022 at 2:14 AM

There’s a paper that uses diffusion to essentially erase adversarial attacks/patches. I’m on mobile so I can’t link it but it’s an interesting application for sure

Towzeur t1_isefejd wrote on October 15, 2022 at 10:16 AM

(Certified!!) Adversarial Robustness for Free!

ThrowThisShitAway10 t1_isdsaq4 wrote on October 15, 2022 at 4:58 AM

As others have mentioned, Cold Diffusion proved this.

zergling103 OP t1_isdxd7x wrote on October 15, 2022 at 5:59 AM

If I understand correctly, Cold Diffusion, like the original diffusion network, assumes the perturbations are made in pixel space. That is, noise or other corruptions are added to or removed from individual RGB values for each pixel.

Latent diffusion models seem to perform better. They encode the image using a pretrained autoencoder, then the perturbations are added to or removed from the latent vectors. The network trains to take steps in a model's latent space instead of in pixel space.

However, the latent space of an autoencoder is a kind of information bottleneck, so you wouldn't be able to use it to encode real-world degradation perfectly, or make lossless tweaks to a given image you want to restore.

I wonder if the two concepts can be merged somehow? A lossless autoencoder?

Prinzessid t1_iseb58v wrote on October 15, 2022 at 9:14 AM

I think you can train a denoising autoencoder without a bottleneck.

danja t1_isfxrfl wrote on October 15, 2022 at 6:00 PM

Seems like there are maybe 3+ tangential problems here. Noise is one. Then for your 'simple' list, most of those are the result of direct non-linear transformations, I would imagine an old-school mix of convolution and trad neural nets could come up with their inverses fairly efficiently. The 'complex' list - hmm, the word Deep springs to mind ...