MohamedRashad OP t1_irwcx4x wrote on October 11, 2022 at 3:15 PM

This is the closest thing to what I want.

Thanks

adam_jc t1_irwh173 wrote on October 11, 2022 at 3:42 PM

there is a version on Replicate you can try easily

https://replicate.com/methexis-inc/img2prompt

MohamedRashad OP t1_irwnbma wrote on October 11, 2022 at 4:24 PM

This is amazing (there is also other projects on the same idea).

Thanks a lot

milleniumsentry t1_irxphr1 wrote on October 11, 2022 at 8:28 PM

Anytime! Good luck on your endeavors!

JoeySalmons t1_irwbazh wrote on October 11, 2022 at 3:04 PM

I am really surprised it took this long for this to be mentioned/suggested. I was just about to comment about it too. Anyone who has used automatic1111's webui for Stable Diffusion would have also known about the built in CLIP interrogate feature it has, which works somewhat well for Stable Diffusion. Might also work for other txt2img models.

nmkd t1_irx0805 wrote on October 11, 2022 at 5:47 PM

Feeding the CLIP interrogator result back into Stable Diffusion results in completely different images though.

It's not good.

milleniumsentry t1_irxa3sg wrote on October 11, 2022 at 6:51 PM

No no. It only tells you what prompts it would use to generate a similar image. There is no actual prompt data accessible in the image/meta data. With millions of seeds, and billions of word combinations, you wouldn't be able to reverse engineer it.

I think having an embed for those interested would be a great step. Then you could just read the file and go from there.

visarga t1_irziac5 wrote on October 12, 2022 at 5:02 AM

Now is the time to convince everyone to embed the prompt data in the generated images, since the trend is just starting. Could be also useful later when we crawl the web, to separate real from generated images.

milleniumsentry t1_is13giv wrote on October 12, 2022 at 3:15 PM

I honestly think this will be a step in the right direction. Not actually for prompt sharing, but for refinement. These networks will start off great at telling you.. that's a hippo.... that's a potato.. but what happens when someone wants to create a hippotato...

I think without some sort of tagging/self reference, the data runs to risk of self reinforcement... as the main function of the task is to bash a few things together into something else. At what point will it need extra information so that it knows, yes.. this is what they wanted... this is a good representation of the task...

A tag back loop would be phenomenal. Imagine if you ask for a robotic cow with an astronaut friend. Some of those image, will be lacking robot features, some won't look like cows... etc. Ideally, your finished piece would be tagged as well... but perhaps missing the astronaut... or another part of the initial prompt request. By removing tags that were not generated by the prompt, the two can be compared for a soft 'success' rate.

[D] Reversing Image-to-text models to get the prompt

milleniumsentry t1_irwa0j0 wrote on October 11, 2022 at 2:55 PM