Submitted by carlthome t3_10mhbqv in MachineLearning
starstruckmon t1_j6d3lsr wrote
Reply to comment by Maximum-Nectarine-13 in [D] MusicLM: Generating Music From Text by carlthome
I can guarantee the next paper out of this Google team is going to be a diffusion model ( instead of AudioLM ) conditioned on MuLan embeddings.
The strength of the Google model is the text understanding which is coming from the MuLan embeddings. While the strength of the work you highlighted is the quality from the diffusion model.
It's the obvious next step following the same path as Dalle1->Dalle2.
Viewing a single comment thread. View all comments