TFenrir OP t1_j0dk5rr wrote on December 15, 2022 at 9:31 PM

Reply to comment by Kinexity in Riffusion: Stable diffusion fine tuned on spectrograms (image representations of music) creates prompt based music, in real time by TFenrir

I mostly agree, but I think there is some opportunity here. Using img2img in real time to extend audio forever, and the relationship between images and audio in general are quite interesting - would a model that is only trained on these images provide a "better" result? Would different fine tuned models give different experiences? How is this impacted by other improvements to models?