Submitted by Tlaloc-Es t3_10z1jxz in MachineLearning
goj-145 t1_j80ufu1 wrote
We're going to find out soon with the Getty lawsuit. Until then, gray area.
sweatierorc t1_j854tn3 wrote
On the training part, it is probably legal, though you need to be careful about something like GDPR. E.g. for facial recognition, there are extra rules.
The "sharing model and/or its prediction" is the gray area.
Edit:t ypo
Tlaloc-Es OP t1_j80xdxu wrote
But anyway, is hard to demonstrate which is the dataset of a model right? in the case of Getty you can probably get images that look like Getty image dataset, but for a predictor? and if this case for example where "there wasn't any law" or predecessor case can lose the lawsuit having to pay?
goj-145 t1_j80xlao wrote
Not really hard when the model is spitting out watermarked images.
Miguel33Angel t1_j830cig wrote
He's asking in the case of a predictor i.e. ResNet or other models that just categorizes
goj-145 t1_j831dqg wrote
The question is can you use copyrighted info to train a model. The answer is we don't know yet.
The current lawsuit that will define precedent on this is for image generation using copyrighted Getty images in a training model. It's proven that Getty images are used because the watermark shows up in the output of the model many times which is the answer to "how can they prove it".
Once that is defined, then we will know if it is legal or not in those jurisdictions. And then we will get to the "do we do it anyways even though it's illegal?"
2blazen t1_j8378vr wrote
So you're saying Stability wouldn't have issues if they hired an intern to git clone a watermark remover and put the images through it first?
goj-145 t1_j83801h wrote
It would have been MUCH harder to prove if they spent a day preprocessing the images first!
currentscurrents t1_j85rpol wrote
They use the open LAION 50B dataset, everybody knows what's in there.
Still, some preprocessing and deduplication would have been a good idea just for output quality.
Ulfgardleo t1_j84fdfl wrote
if it is illegal now it would be super illegal then, because removing watermarks on its own typically violates the license of the material.
​
The question is 100% the same as "can i include GPLv3 code in my commercial closed source repository if i remove the license headers and ensure that the code ris never published?"
Viewing a single comment thread. View all comments