Submitted by TheCockatoo t3_10m1sdm in MachineLearning
moschles t1_j62iaxi wrote
GANs produce an image "cut from the whole cloth" at once.
Diffusion models are using a trick -- wherein between rounds of incremental noise removal, they perform a super resolution round.
Technically speaking, you could start from GAN output, and then take it through rounds of super-resolution. The result would look a lot like what diffusion models produce. This leaves a question as to how the new details would be guided, ( or more technically), what the super resolution features would be conditioned upon. If you are going to condition them on text embeddings, you might as well condition the whole process on the same embedding . . . now you just have a diffusion model.
A second weakness of GANs is the narrowness of their variety. When made to produce vectors corresponding to a category "dog" , they tend to produce to nearly exactly the same dog each time.
Viewing a single comment thread. View all comments