Viewing a single comment thread. View all comments

Kafke t1_j4a1yik wrote

Yes. Look at stable diffusion and riffusion for an example of this. Music isn't fundamentally different from images and text in terms of how modern AI works.

4

Ronny_Jotten t1_j4b5fqx wrote

Images and text are already quite different from each other though, in terms of AI generators. The image generators include a language model, but work on a diffusion principle that the text generators don't use. Riffusion's approach of using a diffusion image generator with sonograms is interesting to some extent, but I sincerely doubt it will be the future direction of high-quality music generators.

5