Submitted by happyhammy t3_zm51z0 in MachineLearning
My theory:
- no good datasets, as opposed to image datasets like LAION
- harder/illegal to get music datasets. Shady methods are usually required to get large music datasets (like torrenting). The only music datasets I've found are classical, and even then, very limited as performances of classical music are still copyrighted.
Therefore, large companies like OpenAI/Google are unable to take the risk in making a good generative music AI due to legal reasons. Startups have a better chance because they have less to lose and can better hide the fact that they trained their model with copyrighted material.
Other than that, I don't believe audio is more challenging to process than images because the complete audio file can be reduced to its spectrogram, which is just a 2D image.
TLDR: No good datasets
zero0_one1 t1_j0ag17x wrote
I think doing the whole audio file at once is a worse direction than doing melody, vocals, chords, lyrics, beats as separate elements. I've focused on melody and I'm happy with my results: https://www.youtube.com/playlist?list=PLoCzMRqh5SkFPG0-RIAR8jYRaICWubUdx. I'm about to run a Mechanical Turk study to compare their quality to fully human hit melodies.