Viewing a single comment thread. View all comments

samb-t t1_j50gpn4 wrote

I think what you're looking for is palette which is for paired image-to-image translation with conditional diffusion models. I believe that approach is exactly what you're describing, concatenating down the channels dimension.

6

pilooch t1_j59g48g wrote

Absolutely, I do second this, Palette is what you are looking for. We have a modified version in JoliGAN, with PR for various conditioning, including masks and sketches, cf https://github.com/jolibrain/joliGAN/pull/339

Palette-like DDPM works exceptionnally well (we have industrial-grade use cases), but a paired dataset is required, that's the number one drawback I see atm. My understanding is that unpaired diffusion but for at least a single work (UNIT-DDPM) without a known public implementation remains a research field.

1