Comments

You must log in or register to comment.

goj-145 t1_j80ufu1 wrote

We're going to find out soon with the Getty lawsuit. Until then, gray area.

38

Tlaloc-Es OP t1_j80xdxu wrote

But anyway, is hard to demonstrate which is the dataset of a model right? in the case of Getty you can probably get images that look like Getty image dataset, but for a predictor? and if this case for example where "there wasn't any law" or predecessor case can lose the lawsuit having to pay?

−1

DataGOGO t1_j813dui wrote

It is legal until a court says otherwise.

5

goj-145 t1_j831dqg wrote

The question is can you use copyrighted info to train a model. The answer is we don't know yet.

The current lawsuit that will define precedent on this is for image generation using copyrighted Getty images in a training model. It's proven that Getty images are used because the watermark shows up in the output of the model many times which is the answer to "how can they prove it".

Once that is defined, then we will know if it is legal or not in those jurisdictions. And then we will get to the "do we do it anyways even though it's illegal?"

3

cajmorgans t1_j8416i1 wrote

Even if it will become illegal, the democracy of Machine Learning depends on it being legal. If Getty wins this, it would mean that a few pretty large companies would be the only ones that can build large models because they “own” most of the data. Facebook for example does a lot of stuff to prevent people scrape public data from their apps.

3

Ulfgardleo t1_j84fdfl wrote

if it is illegal now it would be super illegal then, because removing watermarks on its own typically violates the license of the material.

​

The question is 100% the same as "can i include GPLv3 code in my commercial closed source repository if i remove the license headers and ensure that the code ris never published?"

0

Ulfgardleo t1_j84fokp wrote

legally the data is not public and the fact that facebook is actively trying to prevent scraping is making it very difficult to argue otherwise.

Legally, the data cnanot be public. The users give facebook a non-exclusive license with limited rights to store and process the data. From this does not follow the right that anyone who sees the shared images (for example) has a right to process them as well. If that wasthe case, the terms (https://www.facebook.com/terms.php 3.1) would have to state under which license the works are redistributed by facebook.

2

sweatierorc t1_j854tn3 wrote

On the training part, it is probably legal, though you need to be careful about something like GDPR. E.g. for facial recognition, there are extra rules.

The "sharing model and/or its prediction" is the gray area.

Edit:t ypo

1

a_user_to_ask t1_j88f7f6 wrote

The owner of the image are who have to decide the uses of their images. "All rights reserved" means that: the owner have rights for any use of images now and whatever someone invent in the future.

In an ideal world, each image of a dataset used in machine learning have to be identified with author and license. But I understand that is difficult to achieve because images are copied in the www and it is difficult locate the original source.

So, I have no doubt about the illegality of use images from web scrapping. Other thing is how easy is win/loss a lawsuit and to prove you used that data or not.

1