goj-145 t1_j80ufu1 wrote on February 10, 2023 at 8:12 PM

#1,773,700

We're going to find out soon with the Getty lawsuit. Until then, gray area.

Tlaloc-Es OP t1_j80xdxu wrote on February 10, 2023 at 8:31 PM

#1,773,832

But anyway, is hard to demonstrate which is the dataset of a model right? in the case of Getty you can probably get images that look like Getty image dataset, but for a predictor? and if this case for example where "there wasn't any law" or predecessor case can lose the lawsuit having to pay?

goj-145 t1_j80xlao wrote on February 10, 2023 at 8:33 PM

#1,773,846

Replying to Tlaloc-Es (#1,773,832)

Not really hard when the model is spitting out watermarked images.

DataGOGO t1_j813dui wrote on February 10, 2023 at 9:10 PM

#1,774,134

It is legal until a court says otherwise.

[deleted] t1_j81dlwc wrote on February 10, 2023 at 10:19 PM

#1,774,608

[deleted]

Tlaloc-Es OP t1_j81dot2 wrote on February 10, 2023 at 10:20 PM

#1,774,616

Replying to DataGOGO (#1,774,134)

And could be any retroactive penalty?

DataGOGO t1_j81fm63 wrote on February 10, 2023 at 10:34 PM

#1,774,711

Replying to Tlaloc-Es (#1,774,616)

not likely, if found illegal, then you would have to "remove" the offending "images"

[deleted] t1_j82rcdb wrote on February 11, 2023 at 4:54 AM

#1,777,074

[deleted]

Fragrant_Weakness547 t1_j82yp54 wrote on February 11, 2023 at 6:13 AM

#1,777,418

Replying to [deleted] (#1,777,074)

>That is the Million Dollar question (or really hundred million dollar question in terms of legal fees)

It's worth a lot more than that. The profit margins of AI focused companies are kind of on the line here.

Miguel33Angel t1_j830cig wrote on February 11, 2023 at 6:33 AM

#1,777,481

Replying to goj-145 (#1,773,846)

He's asking in the case of a predictor i.e. ResNet or other models that just categorizes

goj-145 t1_j831dqg wrote on February 11, 2023 at 6:46 AM

#1,777,519

Replying to Miguel33Angel (#1,777,481)

The question is can you use copyrighted info to train a model. The answer is we don't know yet.

The current lawsuit that will define precedent on this is for image generation using copyrighted Getty images in a training model. It's proven that Getty images are used because the watermark shows up in the output of the model many times which is the answer to "how can they prove it".

Once that is defined, then we will know if it is legal or not in those jurisdictions. And then we will get to the "do we do it anyways even though it's illegal?"

2blazen t1_j8378vr wrote on February 11, 2023 at 8:01 AM

#1,777,744

Replying to goj-145 (#1,773,846)

So you're saying Stability wouldn't have issues if they hired an intern to git clone a watermark remover and put the images through it first?

goj-145 t1_j83801h wrote on February 11, 2023 at 8:11 AM

#1,777,778

Replying to 2blazen (#1,777,744)

It would have been MUCH harder to prove if they spent a day preprocessing the images first!

cajmorgans t1_j8416i1 wrote on February 11, 2023 at 2:10 PM

#1,779,017

Even if it will become illegal, the democracy of Machine Learning depends on it being legal. If Getty wins this, it would mean that a few pretty large companies would be the only ones that can build large models because they “own” most of the data. Facebook for example does a lot of stuff to prevent people scrape public data from their apps.

[deleted] t1_j841k34 wrote on February 11, 2023 at 2:13 PM

#1,779,046

Replying to cajmorgans (#1,779,017)

[deleted]

Ulfgardleo t1_j84fdfl wrote on February 11, 2023 at 3:28 PM

#1,779,891

Replying to 2blazen (#1,777,744)

if it is illegal now it would be super illegal then, because removing watermarks on its own typically violates the license of the material.

The question is 100% the same as "can i include GPLv3 code in my commercial closed source repository if i remove the license headers and ensure that the code ris never published?"

Ulfgardleo t1_j84fokp wrote on February 11, 2023 at 3:30 PM

#1,779,914

Replying to cajmorgans (#1,779,017)

legally the data is not public and the fact that facebook is actively trying to prevent scraping is making it very difficult to argue otherwise.

Legally, the data cnanot be public. The users give facebook a non-exclusive license with limited rights to store and process the data. From this does not follow the right that anyone who sees the shared images (for example) has a right to process them as well. If that wasthe case, the terms (https://www.facebook.com/terms.php 3.1) would have to state under which license the works are redistributed by facebook.

sweatierorc t1_j854tn3 wrote on February 11, 2023 at 6:25 PM

#1,781,287

Replying to goj-145 (#1,773,700)

On the training part, it is probably legal, though you need to be careful about something like GDPR. E.g. for facial recognition, there are extra rules.

The "sharing model and/or its prediction" is the gray area.

Edit:t ypo

currentscurrents t1_j85rpol wrote on February 11, 2023 at 9:04 PM

#1,782,638

Replying to goj-145 (#1,777,778)

They use the open LAION 50B dataset, everybody knows what's in there.

Still, some preprocessing and deduplication would have been a good idea just for output quality.

a_user_to_ask t1_j88f7f6 wrote on February 12, 2023 at 12:35 PM

#1,787,529

In an ideal world, each image of a dataset used in machine learning have to be identified with author and license. But I understand that is difficult to achieve because images are copied in the www and it is difficult locate the original source.

So, I have no doubt about the illegality of use images from web scrapping. Other thing is how easy is win/loss a lawsuit and to prove you used that data or not.

Tlaloc-Es OP t1_j89x7hi wrote on February 12, 2023 at 7:22 PM

#1,790,222

Replying to a_user_to_ask (#1,787,529)

I think the same, but for example, If I scrape images from google with copyleft (that are wrong set), or without info, who is guilty?

[D] Is it legal to use images or videos with copyright to train a model?

Comments

goj-145 t1_j80ufu1 wrote on February 10, 2023 at 8:12 PM

Tlaloc-Es OP t1_j80xdxu wrote on February 10, 2023 at 8:31 PM

goj-145 t1_j80xlao wrote on February 10, 2023 at 8:33 PM

DataGOGO t1_j813dui wrote on February 10, 2023 at 9:10 PM

[deleted] t1_j81dlwc wrote on February 10, 2023 at 10:19 PM

Tlaloc-Es OP t1_j81dot2 wrote on February 10, 2023 at 10:20 PM

DataGOGO t1_j81fm63 wrote on February 10, 2023 at 10:34 PM

[deleted] t1_j82rcdb wrote on February 11, 2023 at 4:54 AM

Fragrant_Weakness547 t1_j82yp54 wrote on February 11, 2023 at 6:13 AM

Miguel33Angel t1_j830cig wrote on February 11, 2023 at 6:33 AM

goj-145 t1_j831dqg wrote on February 11, 2023 at 6:46 AM

2blazen t1_j8378vr wrote on February 11, 2023 at 8:01 AM

goj-145 t1_j83801h wrote on February 11, 2023 at 8:11 AM

cajmorgans t1_j8416i1 wrote on February 11, 2023 at 2:10 PM

[deleted] t1_j841k34 wrote on February 11, 2023 at 2:13 PM

Ulfgardleo t1_j84fdfl wrote on February 11, 2023 at 3:28 PM

Ulfgardleo t1_j84fokp wrote on February 11, 2023 at 3:30 PM

sweatierorc t1_j854tn3 wrote on February 11, 2023 at 6:25 PM

currentscurrents t1_j85rpol wrote on February 11, 2023 at 9:04 PM

a_user_to_ask t1_j88f7f6 wrote on February 12, 2023 at 12:35 PM

Tlaloc-Es OP t1_j89x7hi wrote on February 12, 2023 at 7:22 PM