gurenkagurenda t1_j9ae1ic wrote on February 20, 2023 at 2:02 PM

I don’t think it will slow AI at this point, so much as it will concentrate control over AI even more into the hands of well funded, established players. OpenAI has already hired an army of software developer contractors to produce training data for Codex. The same could be done even more cheaply for writers. The technology is proven now, so there’s no risk anymore. We know that you just need the training data.

So the upshot would just be a higher barrier to entry. Training a new model means not only funding the compute, but also paying to create the training set.

bairbs t1_j9agwyq wrote on February 20, 2023 at 2:25 PM

Exactly. This is what big tech has been doing already to create legal and ethical data.

The training data is the bottleneck. OpenAI is trying to see if they can pull a fast one by releasing models using copyrighted material

gurenkagurenda t1_j9amk4h wrote on February 20, 2023 at 3:08 PM

They’re not “pulling a fast one”. There’s no precedent here, and there’s a boatload of lawyers who agree that this is fair use. There are also a number who believe that it won’t be. The courts will have to figure it out, but until then, nobody knows how it will play out.

bairbs t1_j9ao00n wrote on February 20, 2023 at 3:18 PM

They actually are. The precedent has been to use public domain material (which is why there are so many fine art style GANs), create your own data, pay for data to be created, pay for existing data, or keep the models private. There are plenty more artists and other jobs than lawyers who know this isn't fair use and will be negatively impacted if these companies are allowed to continue this practice.

gurenkagurenda t1_j9av16n wrote on February 20, 2023 at 4:07 PM

That's not what I mean by precedent. I mean that there is no legal precedent.

bairbs t1_j9aygil wrote on February 20, 2023 at 4:30 PM

Lol, if you think these huge companies don't have teams of lawyers advising them on how to legally create models, you're nuts. OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.

But keep doing your own research. Maybe they'll hire you (or maybe they already do)

gurenkagurenda t1_j9b8gzz wrote on February 20, 2023 at 5:35 PM

> OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.

Please cite the case that you're talking about which you claim sets this precedent. Thanks.

bairbs t1_j9agew9 wrote on February 20, 2023 at 2:21 PM

People can do whatever they want with copyright privately. It's when you release the work or try to commercialize it that causes the problems. Nothing is stopping AI companies from scraping and training all day. In order to release it, they should compensate the copyright holders

Slippedhal0 t1_j9asf0r wrote on February 20, 2023 at 3:49 PM

Technically thats not correct, its just very hard to enforce private use. For example, if you copy a movie, even for prvate use(except very specific circumstances) thats illegal, and people have been charged.

That said, the public release point is what I was thinking of anyway.

bairbs t1_j9awnxc wrote on February 20, 2023 at 4:18 PM

Technically, if you bought the movie, you could copy it for your own use. You just can't share it, which to your point is very hard to enforce for private use outside of the internet.

I'm thinking of fair use when I say "do whatever they want with copyright privately"

OpenAI Is Faulted by Media for Using Articles to Train ChatGPT

Slippedhal0 t1_j99zasl wrote on February 20, 2023 at 11:29 AM