Viewing a single comment thread. View all comments

bluehands t1_j2cyz5r wrote

Your list of "text-to-X" highlights for me the need for "X-to-text". Captioning is nice but are names attached, is meaning extracted? (it maybe that I am just not aware of the state of the art)

3

currentscurrents t1_j2czmdk wrote

Basically anything you can generate, you can also classify. Most of the image generators use CLIP for guidance, so if they can generate a sad face (and they can), CLIP can tell you whether or not a face is sad.

9