TFenrir OP t1_j0dk5rr wrote
Reply to comment by Kinexity in Riffusion: Stable diffusion fine tuned on spectrograms (image representations of music) creates prompt based music, in real time by TFenrir
I mostly agree, but I think there is some opportunity here. Using img2img in real time to extend audio forever, and the relationship between images and audio in general are quite interesting - would a model that is only trained on these images provide a "better" result? Would different fine tuned models give different experiences? How is this impacted by other improvements to models?
Viewing a single comment thread. View all comments