Hi. Today I've came across this interesting paper https://arxiv.org/abs/2210.16056 that proposes interesting method to combine semantics of text and image in diffusion process.

In short, this mixes "layout" with "content", however unlike style transfer,

>"...semantic mixing aims to fuse multiple semantics into one single object."

I was surprised by the examples they showed, so I wanted to try it but the code wasn't available. I've implemented the method myself, and I wanted to share it here!

https://github.com/cloneofsimo/magicmix

Layout of \"realistic photo of a rabbit\" with content of \"tiger\"

I hope my implementation helps who is reading the paper!

Note: I'm not the author of the paper, and this is not an official implementation

Comments

Inevitable-Ad8503 t1_iuth7a3 wrote on November 2, 2022 at 10:08 PM

#430,670

Nice. Thanks for taking the time to implement and to share. I get the feeling that this approach will be way more coherent for certain types of content and layout (mayhaps when prompting something like “a tiger and a bunny sitting side by side” and the “traditional” (is it old enough to have traditions yet?) approach will be better for other types, perhaps such as “a tiger and a bunny”, where the intent is what that approach often results in, some interpolation between the two objects; a chimera or mutant.

Or maybe another way to explain is that this MagixMix is akin to orchestration of multiple parts, and the present approach is more akin to a mashup of multiple parts; each has its own strengths and appropriate applications.

cloneofsimo OP t1_iuthr09 wrote on November 2, 2022 at 10:11 PM

#430,698

Replying to Inevitable-Ad8503 (#430,670)

Thanks! I hope this helps!

IntelArtiGen t1_iutivin wrote on November 2, 2022 at 10:19 PM

#430,753

Thanks for this implem, I'll try it out!

cloneofsimo OP t1_iutiw0k wrote on November 2, 2022 at 10:19 PM

#430,754

Replying to Inevitable-Ad8503 (#430,670)

Indeed, there are so many natural methods to interpolate concepts, and I agree 100% that there some are better than others at certain tasks.
Compared to famous Img2Img, I understood this as a "generalized" method to interpolate. Since if you take \mu = 1.0, this becomes just Img2Img interpolation. You can read the paper to see the effect of \mu on interpolation, and it's quite interesting. Since this is more general approach, there are more things to tweak and figure out I guess...?

starstruckmon t1_iuu086h wrote on November 3, 2022 at 12:24 AM

#431,534

I only gave the paper a cursory look, but isn't this the same thing as prompt editing?

cloneofsimo OP t1_iuvbdjb wrote on November 3, 2022 at 8:02 AM

#433,453

Replying to starstruckmon (#431,534)

Prompt edit seems to be special case of MagicMix where Kmax = Kmin = T and nu = 0. MagicMix is more like Img2Img than sampling where ive understood it

LetterRip t1_iuxas7h wrote on November 3, 2022 at 6:12 PM

#436,689

Replying to starstruckmon (#431,534)

It is prompt editing + prompt interpolation. So N steps of A, M steps of A transitioning to B, and then the remaining steps at B.

meldiwin t1_iuxmgkv wrote on November 3, 2022 at 7:27 PM

#437,134

I am not in the field, but I am working on multi-materials architecture designs and one of the questions how can I started design such system to come up with new possibilities of architecture and new geometries. While reading the abstract in that paper it mentions "novel object synthesis" what this mean actually?

I am also struggling to understand what are the possibilities behind the explosion of diffusion models beyond art, sorry if that sound ignorant but I want to understand and deploy this in my work hopefully.

msbeaute00000001 t1_iuxrxkk wrote on November 3, 2022 at 8:01 PM

#437,336

Replying to meldiwin (#437,134)

I would say that you should colab with someone in the field, doing this by yourself when you are not in the field might take a lot of time and a bit lower chance of success.

starstruckmon t1_iuxsbge wrote on November 3, 2022 at 8:04 PM

#437,353

Replying to LetterRip (#436,689)

Thanks. I understood it partially, but your explanation made everything crystal clear and things all clicked in an instant.

I wish more papers has an "intuition" section like this.

meldiwin t1_iuxshbk wrote on November 3, 2022 at 8:05 PM

#437,363

Replying to msbeaute00000001 (#437,336)

Do you have any names you would recommend?

LetterRip t1_iuxy54g wrote on November 3, 2022 at 8:40 PM

#437,570

Replying to meldiwin (#437,134)

pretty sure 'novel object' means a image that is the combination of multiple objects so for instance - dog + coffee_pot = dog with some characteristics of a coffee_pot (in the image examples the head was short of coffee pot like). rabbit + tiger = rabbit with tiger charactistics. rabbit + sheep = rabbit with sheep characteristics (the example showed a rabbit with a wool like texture as opposed to rabbit fur texture).

msbeaute00000001 t1_iv2010u wrote on November 4, 2022 at 5:59 PM

#444,299

Replying to meldiwin (#437,363)

You just need to find someone working in the ML/AI that you could work with. This person needs to understand your problem and convert it into a ML problem. If you don't mind, I can take a look. I am looking for someway to apply my skills anyway.

CatalyzeX_code_bot t1_iviwezb wrote on November 8, 2022 at 8:09 AM

#471,716

Found relevant code at https://magicmix.github.io + all code implementations here

To opt out from receiving code links, DM me

Tioben t1_iwei425 wrote on November 15, 2022 at 1:15 AM

#531,374

I've never been able to get a PC-based version of anything to run successfully, so if anyone makes or encounters a colab, huggingface, or similar implementation, I'd much appreciate a link!