Submitted by happyhammy t3_zm51z0 in MachineLearning
abriec t1_j0b6frn wrote
What is “good” music?
Certainly not the full picture, but imo one of the reasons we don’t see music generation taking off the same way as image/text is it’s more difficult to evaluate and therefore benchmark and iterate.
It faces similar challenges as generative modelling in other modalities, but is arguably more subjective, time-consuming, and require more training if using human evaluation. A layperson can easily tell if an image or text is “good”, it’s more complex for music once it gets above a certain minimum quality threshold.
From a business perspective it’s harder to sell too given the scope of applications (relative to language and vision), as interesting as the problem sounds to us.
Plus, echoing the other comment, I feel it’s reductionistic to flatten music into spectrograms when there are interlaying elements. My intuition is it’ll be better to model dependencies between individual “tracks” as well. I’m sure there’s extensive work on music generation with good results already, just not quite in the spotlight yet.
Osemwaro t1_j0c0fp2 wrote
If by "a layperson can easily tell if an image or text is “good”", you mean a layperson can easily tell if the image depicts a physically plausible or photo-realistic scene, or if the text makes sense, then I agree that music is harder in this sense. The closest musical analogy for these quality issues is perhaps telling whether or not the instruments sound realistic, and laypeople don't spend enough time focusing on the sound of real instruments to be really good at this.
But if you're talking about judging artistic merit, then I don't think a layperson is any better at doing this with images and text than they are with music. Artistic judgement is extremely subjective across all fields of artistic expression, and experts in these fields often disagree with each other, or with the general public, about what's good and what isn't. E.g. compare the popularity of Fifty Shades of Grey to its critical reception.
There's a massive commercial demand for music in TV, films, advertising, games, theatre and online video creation too, so I don't think it would be that hard to make a business case for it, if the data was readily available.
Viewing a single comment thread. View all comments