tetrisdaemon OP t1_izjp9nc wrote
Reply to comment by calciumcitrate in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
I'm looking into it, but I'm guessing it's the CLIP embeddings, so disentanglement might need to happen at that level. Some supporting evidence is that even if we set the cross attention to zero (for some words), it'll still reflect in the final image, indicating that the word representations are mixed in CLIP.
Viewing a single comment thread. View all comments