Submitted by ravik_reddit_007 t3_zzitu1 in technology
unua_nomo t1_j2e5rp7 wrote
I mean, honestly wouldn't be that hard to even crowd source training an open source model right?
misconfigbackspace t1_j2en6pf wrote
unua_nomo t1_j2enydh wrote
Crowdsource the funding, not the content the model is trained on
misconfigbackspace t1_j2erpp0 wrote
Funding one time's fairly easy. Getting a copy of that data is a little harder. That data will become stale in real time as the world moves forward, so that's the other big thing to keep in mind. I wonder what legal challenges will come up in the event the model copies stuff from litigious IP owners like Disney, the top music artists, Hollywood and the like.
unua_nomo t1_j2eyhnh wrote
I mean there are already open source datasets available, such as the Pile.
I can't see any argument for why a model derived on open source data would likewise not be open source, at which point if you could argue that a ML model could produce ip breaking content, that would be the responsibility of the individual producing and subsequently distributing that content.
As for data becoming stale, that wouldn't necessarily be an issue for plenty of applications, and even then there's no reason you couldn't just crowd fund 80k a year to train a newly updated model with newer content folded in.
misconfigbackspace t1_j2ez1sa wrote
> such as the Pile.
TIL. Thanks.
syfari t1_j2fekeo wrote
Challenges are already popping up from artists over diffusion models. A lot of this has already been settled though as courts have determined model training to fall under fair use.
Viewing a single comment thread. View all comments