Submitted by Tough_Gadfly t3_116wnco in technology
Slippedhal0 t1_j99zasl wrote
It's the same argument that artist's complaining about using copyrighted artwork as training data.
At some point there will be a major ruling about how companies training AI need to approach copyright for their training data sources, and if they rule in favour of copyright holders it will probably severely slow AI progress as systems to request permission are built.
Although I could maybe see a fine-tuned AI like bing being less affected because it cites sources rather than opaquely uses previously acquired knowledge
gurenkagurenda t1_j9ae1ic wrote
I don’t think it will slow AI at this point, so much as it will concentrate control over AI even more into the hands of well funded, established players. OpenAI has already hired an army of software developer contractors to produce training data for Codex. The same could be done even more cheaply for writers. The technology is proven now, so there’s no risk anymore. We know that you just need the training data.
So the upshot would just be a higher barrier to entry. Training a new model means not only funding the compute, but also paying to create the training set.
bairbs t1_j9agwyq wrote
Exactly. This is what big tech has been doing already to create legal and ethical data.
The training data is the bottleneck. OpenAI is trying to see if they can pull a fast one by releasing models using copyrighted material
gurenkagurenda t1_j9amk4h wrote
They’re not “pulling a fast one”. There’s no precedent here, and there’s a boatload of lawyers who agree that this is fair use. There are also a number who believe that it won’t be. The courts will have to figure it out, but until then, nobody knows how it will play out.
bairbs t1_j9ao00n wrote
They actually are. The precedent has been to use public domain material (which is why there are so many fine art style GANs), create your own data, pay for data to be created, pay for existing data, or keep the models private. There are plenty more artists and other jobs than lawyers who know this isn't fair use and will be negatively impacted if these companies are allowed to continue this practice.
gurenkagurenda t1_j9av16n wrote
That's not what I mean by precedent. I mean that there is no legal precedent.
bairbs t1_j9aygil wrote
Lol, if you think these huge companies don't have teams of lawyers advising them on how to legally create models, you're nuts. OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.
But keep doing your own research. Maybe they'll hire you (or maybe they already do)
gurenkagurenda t1_j9b8gzz wrote
> OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.
Please cite the case that you're talking about which you claim sets this precedent. Thanks.
bairbs t1_j9agew9 wrote
People can do whatever they want with copyright privately. It's when you release the work or try to commercialize it that causes the problems. Nothing is stopping AI companies from scraping and training all day. In order to release it, they should compensate the copyright holders
Slippedhal0 t1_j9asf0r wrote
Technically thats not correct, its just very hard to enforce private use. For example, if you copy a movie, even for prvate use(except very specific circumstances) thats illegal, and people have been charged.
That said, the public release point is what I was thinking of anyway.
bairbs t1_j9awnxc wrote
Technically, if you bought the movie, you could copy it for your own use. You just can't share it, which to your point is very hard to enforce for private use outside of the internet.
I'm thinking of fair use when I say "do whatever they want with copyright privately"
Viewing a single comment thread. View all comments