leroy_hoffenfeffer t1_j4vusek wrote on January 18, 2023 at 4:55 PM

Im not sure why the art tools themselves are being targeted and not the dataset developers like LAION.

Stable Diffusion, Midjourney, etc. are just using datasets from LAION. I would think the buck stops with them in terms of getting permission / rights to use image URLs in their datasets.

I don't think it's fair to target A.I developers themselves in this case - they can more so be considered users to some extent in this case.

Arcosim t1_j4wm8ap wrote on January 18, 2023 at 7:42 PM

>table Diffusion, Midjourney, etc. are just using datasets from LAIO

LAION uses Common Crawl to crawl the net and Common Crawl obeys the robot.txt rules of any site it crawls. Getty images have no case here, if they didn't want their content crawled they should have specified it in their robots.txt file.

Furthermore, Getty is one of the scummiest companies out there, they pretended to have the copyright of tens of millions of images in the Library of Congress, they also take the photos of photographers who publish them under the CC license and then try to shake these photographers for money.

leroy_hoffenfeffer t1_j4wmjso wrote on January 18, 2023 at 7:44 PM

I know how they obtained URLs using CommonCrawl. CommonCrawl isn't the issue.

CommonCrawl only returns URLs. LAION had to take the URLs and download the content contained on the webpage therein.

Arcosim t1_j4wqzbl wrote on January 18, 2023 at 8:11 PM

The point is, if they didn't want that content scrapped, they should have put a rule disallowing it in their robots.txt

leroy_hoffenfeffer t1_j4wvkoo wrote on January 18, 2023 at 8:39 PM

A few issues with this thought process:

Even if folks were to retroactively add or edit robots.txt files to disallow scraping, that does nothing to address the content already scraped and downloaded. So the aspect of LAION downloading potentially copyrighted works is still in play.
I think it's an extremely flaky argument to say "Well, those artists should have edited their robots.txt files to disallow the thing they didn't know was happening". It's a very real possibility that the artists in question didn't even know this kind of stuff was happening, let alone there being something they could do about it. I'm not sure a court would view that argument as being sound.
I think LAION is a European company. Why this is relevant is because of their FAQs:

> If you found your name only on the ALT text data, and the corresponding picture does NOT contain your image, this is not considered personal data under GDPR terms. Your name associated with other identifiable data is. If the URL or the picture has your image, you may request a takedown of the dataset entry in the GDPR page. As per GDPR, we provide a takedown form you can use.

So, LAION is beholden to GDPR terms. I think the potential exists for someone to ask "Well... If my picture and data is considered personal data, why isn't the content I produce also considered personal data?" Current GDPR guidelines behave this way, but I think we may end up seeing edits or rewrites of GDPR guidelines given cases like this.

It's neither reasonable nor sound to say "Artists should have taken this very technical detail into account in order to protect their work."

[deleted] t1_j4w6kd0 wrote on January 18, 2023 at 6:06 PM

Laions license stated to avoid commercializing using there dataset as it was for research only iirc

leroy_hoffenfeffer t1_j4w7e4v wrote on January 18, 2023 at 6:11 PM

That's definitely something that will be brought up in court.

From the layman's perspective, it would seem that Midjourney, etc. are no longer operating as research outlets and are instead offering a commercial product. Corporations are surely treating it like a commercial product at least.

superluminary t1_j4z66m7 wrote on January 19, 2023 at 7:22 AM

Because LAION doesn’t have any money.