Viewing a single comment thread. View all comments

Tanglemix t1_iw8deqf wrote

I just searched for 'A Dragon fighting a warrior' on the Lexica site and then typed the same search term into google image search. ( is a library of AI art)

What you find is a continumm with some truly incoherent images at one end and some brilliant images at the other. For some of that progression I would say that the human made images and the AI made images overlap- are of equal quality and coherence.

But at a certain point the best of the human made Art does stand out as clearly superior, but not because it is better in terms of technique- at least not mainly.

What does distinguish the best human art is the way that the images have been structured and composed. Where AI seems to fail at present is in it's ability to tell the story of the image in a strong coherent way- and this same failing is often seen in the work of non professional human artists too.

It's not clear to me how any evolution of the current AI generators solves this problem, because the domain involved is non verbal. So no degree of refinement in the language model will grant access or control to this level of the image creation process.

There are aspects of image creation that rely on shared cultural understandings to be effective, so you would need a different kind of AI to solve this problem, one that understood that the simple term 'A Dragon fighting a Warrior' is a narrative idea that might involve such subtle concerns as the 'eye lines' between the protagonists (do they seem to be looking at each other?) the ways in which their individual postures and gestures interact to describe the nature of their relationship to each other ( Are they together? Or are they in opposition to each other?) And these concerns multiply exponentially as the scene becomes more complex and more characters and elements are added to the mix.

It seems to me that something approaching a true AGI would need to be involved if these kinds of concerns are to be addressed by AI Art. But I'm no expert, so I'd be interested if anyone thinks this is wrong- do the current models have the ability to incorporate these kinds of abstract concerns in their output?