goj-145 t1_j80xlao wrote on February 10, 2023 at 8:33 PM

Reply to comment by Tlaloc-Es in [D] Is it legal to use images or videos with copyright to train a model? by Tlaloc-Es

Not really hard when the model is spitting out watermarked images.

Miguel33Angel t1_j830cig wrote on February 11, 2023 at 6:33 AM

He's asking in the case of a predictor i.e. ResNet or other models that just categorizes

goj-145 t1_j831dqg wrote on February 11, 2023 at 6:46 AM

The question is can you use copyrighted info to train a model. The answer is we don't know yet.

The current lawsuit that will define precedent on this is for image generation using copyrighted Getty images in a training model. It's proven that Getty images are used because the watermark shows up in the output of the model many times which is the answer to "how can they prove it".

Once that is defined, then we will know if it is legal or not in those jurisdictions. And then we will get to the "do we do it anyways even though it's illegal?"

2blazen t1_j8378vr wrote on February 11, 2023 at 8:01 AM

So you're saying Stability wouldn't have issues if they hired an intern to git clone a watermark remover and put the images through it first?

goj-145 t1_j83801h wrote on February 11, 2023 at 8:11 AM

It would have been MUCH harder to prove if they spent a day preprocessing the images first!

currentscurrents t1_j85rpol wrote on February 11, 2023 at 9:04 PM

They use the open LAION 50B dataset, everybody knows what's in there.

Still, some preprocessing and deduplication would have been a good idea just for output quality.

Ulfgardleo t1_j84fdfl wrote on February 11, 2023 at 3:28 PM

if it is illegal now it would be super illegal then, because removing watermarks on its own typically violates the license of the material.

The question is 100% the same as "can i include GPLv3 code in my commercial closed source repository if i remove the license headers and ensure that the code ris never published?"