HoLeeFaak
HoLeeFaak t1_irvoxe5 wrote
Reply to comment by MohamedRashad in [D] Reversing Image-to-text models to get the prompt by MohamedRashad
What you propose is a cycle-loss. It's valid, but the biggest problem is the non-differentiable parts, and this is a big problem that I didn't find a solution to.
HoLeeFaak t1_irvnlrf wrote
That's a pretty hard problem, because text generation involve argmax/sampling which is not differentiable, so it's hard to optimize a model to generate text that will then be inserted as input to a text2img model to generate a given image. I guess you could do something similar to https://arxiv.org/abs/2111.14447 replacing CLIP with Stable Diffusion, changing the objective a bit, but I think it will be hard to optimize.
HoLeeFaak t1_ivkbo4e wrote
Reply to [D] At what tasks are models better than humans given the same amount of data? by billjames1685
Chess