samb-t t1_j50gpn4 wrote on January 19, 2023 at 3:24 PM

I think what you're looking for is palette which is for paired image-to-image translation with conditional diffusion models. I believe that approach is exactly what you're describing, concatenating down the channels dimension.

pilooch t1_j59g48g wrote on January 21, 2023 at 10:05 AM

Absolutely, I do second this, Palette is what you are looking for. We have a modified version in JoliGAN, with PR for various conditioning, including masks and sketches, cf https://github.com/jolibrain/joliGAN/pull/339

Palette-like DDPM works exceptionnally well (we have industrial-grade use cases), but a paired dataset is required, that's the number one drawback I see atm. My understanding is that unpaired diffusion but for at least a single work (UNIT-DDPM) without a known public implementation remains a research field.