Submitted by teraRockstar t3_y4xjxw in MachineLearning
GANs appear to have been supplanted by diffusion models. What do you think?
Submitted by teraRockstar t3_y4xjxw in MachineLearning
GANs appear to have been supplanted by diffusion models. What do you think?
GANs have potentially faster inference speed. If latency or throughput is a requirement then GAN might be a better choice.
Academic research in GANs has slowed down, but I believe they still have more industrial applications currently in use by tools like photoshop
Am I the only one who finds it a little bizarre how quick people on this sub are to assume concepts have gone obsolete? I understand how quickly this industry is progressing, but just because something isn't the center of attention currently doesn't make it useless or outdated.
I'm definitely not qualified to criticize but sometimes it feels like people are so desperate to be the inventor of the next big thing that they get lost chasing trends and in a weird way inhibit themselves from being able to invent the next visionary concept by doing the same stuff everyone else is into at the time.
Some of my colleagues and myself are working daily with GANs in industry-grade applications.
My current understanding is that due to explicit supervision, DDPM do not directly apply to unpaired datasets, for which GANs shine. There are a few papers about this though, so this should emerge as well. Bear in mind that in industry, some datasets are unpaired by the problem's nature. DDPM are insanely good as soon as the dataset is paired.
GANs generators are very controllable for inference, including real-time. DDPM will follow, but are not there yet exactly AFAIK.
Another quick observation: GANs are more difficult to train but modern implementations and libraries do exhibit fast and accurate convergence.
Can you provide an example of an implementation and a library that make gan converge well ? I tried some a couple years ago and it wasn't easy for me so I'm curious
We use https://github.com/jolibrain/joliGAN which is a lib for image2image with additional "semantic" constraints. I.e. when there's a need to conserve labels, physics, anything between the two domains. This lib aggregates and improves on existing works.
If you are looking for more traditional noise -> xxx GANs, go for https://github.com/autonomousvision/projected_gan/. Another recent work is https://github.com/nupurkmr9/vision-aided-gan.
The key element in GAN convergence is the discriminator. Joligan above defaults to multiple discriminators by combining and improving on the works above, ensuring fast early convergence and stability while the semantic constraints narrow the path to relevant modes.
We've found that tranformers as generators have interesting properties on some tasks and converge well with a ViT-based projected discriminator.
I think that depends on the dataset. If you train a GAN on faces only, it will give you excellent images of faces. If you train a GAN on ImageNet, it will give you bad faces. It's the same for all kinds of image generation models. At least to my understanding, it's a data issue and not a model issue, but please correct me if I'm wrong.
Edit: I worked with GANs for the last couple of years in my PhD. The faces that sota models produce when trained on ImageNet or CoCo look like crap. They look similarly bad as the faces I get when I try out the stable diffusion web demo.
You can correct faces by masking the face and relaunching the diffusion for this croped location.
Not really! One might argue diffusion models to be better but GAN and Diffusion both models have a reason to exist.
With diffusion model, you have a lot of room to play around with different types of data and change your architecture accordance to that data.
But, where GANs outshines, it's a model that's primarily used for tricky things as increasing resolution and more. Usage of GANs have been old-fashioned but there's lot to explore with GANs as it's a dynamic architecture, different from diffusion that's static.
Old-fashioned is a strong word.
When a research breakthrough happens (e.g. Diffusion models nowadays), every lab jumps in to get (or publish) the low-hanging fruit, so publications about that topic explode.
Then, we meet the limits of the technology and the law of diminishing returns kicks in. All of a sudden, most researchers shift gears and move to something more popula. Echo chambers on the former topic dissolve.
Research is still being done on those topics, but at another pace (probably for the better, as published improvements are more fundamental and less incremental).
Another good example of this is reinforcement learning. It was all the jazz when Deepmind published the Atari paper, but as innovation slowed down, lots of researchers moved away from it to publish on generative models.
This is why I dislike research. It feels like it has less to do with convictions and researcher's real interests, and more with becoming an influencer and get lots of citations in a short period of time.
What do you mean by that? If you have diffusion based model trained to generate images, it will do it quite well. Comparable to GANs, or even better. Are you talking about text2img solutions?
EmbarrassedHelp t1_isgoo3p wrote
They are still being used to correct faces and other issues with diffusion model outputs