Viewing a single comment thread. View all comments

misconfigbackspace t1_j2erpp0 wrote

Funding one time's fairly easy. Getting a copy of that data is a little harder. That data will become stale in real time as the world moves forward, so that's the other big thing to keep in mind. I wonder what legal challenges will come up in the event the model copies stuff from litigious IP owners like Disney, the top music artists, Hollywood and the like.

1

unua_nomo t1_j2eyhnh wrote

I mean there are already open source datasets available, such as the Pile.

I can't see any argument for why a model derived on open source data would likewise not be open source, at which point if you could argue that a ML model could produce ip breaking content, that would be the responsibility of the individual producing and subsequently distributing that content.

As for data becoming stale, that wouldn't necessarily be an issue for plenty of applications, and even then there's no reason you couldn't just crowd fund 80k a year to train a newly updated model with newer content folded in.

4

syfari t1_j2fekeo wrote

Challenges are already popping up from artists over diffusion models. A lot of this has already been settled though as courts have determined model training to fall under fair use.

2