Submitted by zergling103 t3_y49i4s in MachineLearning

Diffusion models are trained on a image sequences, where in each sequence, the image is progressively corrupted with noise; given image N, add noise to produce N+1.

The diffusion model learns to reverse the corruption by one step; given image N, predict N-1.

Could other forms of corruption be used instead of uniform noise? Examples:

  • Compression artifacts
  • Perlin noise
  • Uniform noise applied inconsistently across the image
  • Bad camera exposure
  • Banding due to low bit depth
  • Gaussian blur
  • Pixelation, aliasing, or other sampling artifacts
  • Motion blur
  • Color transformations
  • Sequences of corruptions where the choice of degradation is different for each step

Or more complex examples, perhaps for training a model to change the semantics of an image, or repair incoherent outputs:

  • Patches from other images
  • Images with incorrect labels blended in
  • Scrambling image patches with random transformations
  • Sequences of outputs from GANs produced as its training progressed (but with the same seed)
  • DeepDream iterations

Or, in general, any distortion with these properties:

  • Cheap to produce or assemble sequences for
  • Causes the image to become more out-of-distribution from the uncorrupted image dataset for the given prompt

The motivation for asking is, if there isn't something "special" about noise and any drop-in corruption could be used, a diffusion model could:

  • Be used as a blind image restoration; make an image "better" by human measure without changing it significantly.
  • Tweak the content of an image without removing unecessary details with noise; make an image match a prompt with minimal changes.

If there is something "special" about noise (e.g. the model or training procedure makes certain assumptions that depend on noise), what is special about noise, and how can diffusion models be modified to handle more general corruptions?

Thanks!

41

Comments

You must log in or register to comment.

feliximo t1_iscxjm1 wrote

Already a paper on this, they call it Cold Diffusion.

47

zergling103 OP t1_isczxzg wrote

After quickly skimming through the paper, it appears that they use multiple models, one model per type of image degredation. I was hoping to learn about a single general model that can reverse any squence of degredation. Perhaps it'd have better performance; for example the de-blurring cold diffusion model seems to produce outputs that lack detail.

20

mrtransisteur t1_isvjfeu wrote

Sounds like you've got a research paper topic brewing!

2

zergling103 OP t1_iszme0k wrote

I have all sorts of ideas, but little means to test them ::(

1

bdubbs09 t1_isdb32q wrote

There’s a paper that uses diffusion to essentially erase adversarial attacks/patches. I’m on mobile so I can’t link it but it’s an interesting application for sure

9

Towzeur t1_isefejd wrote

(Certified!!) Adversarial Robustness for Free!

3

ThrowThisShitAway10 t1_isdsaq4 wrote

As others have mentioned, Cold Diffusion proved this.

3

zergling103 OP t1_isdxd7x wrote

If I understand correctly, Cold Diffusion, like the original diffusion network, assumes the perturbations are made in pixel space. That is, noise or other corruptions are added to or removed from individual RGB values for each pixel.

Latent diffusion models seem to perform better. They encode the image using a pretrained autoencoder, then the perturbations are added to or removed from the latent vectors. The network trains to take steps in a model's latent space instead of in pixel space.

However, the latent space of an autoencoder is a kind of information bottleneck, so you wouldn't be able to use it to encode real-world degradation perfectly, or make lossless tweaks to a given image you want to restore.

I wonder if the two concepts can be merged somehow? A lossless autoencoder?

4

Prinzessid t1_iseb58v wrote

I think you can train a denoising autoencoder without a bottleneck.

5

danja t1_isfxrfl wrote

Seems like there are maybe 3+ tangential problems here. Noise is one. Then for your 'simple' list, most of those are the result of direct non-linear transformations, I would imagine an old-school mix of convolution and trad neural nets could come up with their inverses fairly efficiently. The 'complex' list - hmm, the word Deep springs to mind ...

0