Viewing a single comment thread. View all comments

Ronny_Jotten t1_j4b5fqx wrote

Images and text are already quite different from each other though, in terms of AI generators. The image generators include a language model, but work on a diffusion principle that the text generators don't use. Riffusion's approach of using a diffusion image generator with sonograms is interesting to some extent, but I sincerely doubt it will be the future direction of high-quality music generators.

5