CallFromMargin t1_j6gmmbk wrote on January 30, 2023 at 4:22 AM

Reply to comment by SeaweedSorcerer in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee

Well, that's a whole load of bullshit.

IAmDrNoLife t1_j6gxfuq wrote on January 30, 2023 at 6:08 AM

Exactly, because it's not true.

Machine Learning (or rather, Deep Learning and Neural Networks) do not "compress the data". They analyse data. They don't store any original art used in the training (otherwise, the size of these models would be in the thousands of terabytes. Instead we see them being a few gigabytes).

Furthermore, these models do not replicate the art it has been trained on. Every single piece of art generated by AI, is something entirely new. Something that has never been seen before. You can debate if it takes skill, but you can't debate that it's something new.

This video is an excellent source of information regarding this topic. It's created by a professional artist who has embraced AI generated art as a source of inspiration and to speeding up their own work.

Even furthermore, courts have indeed shown previously that Google IS allowed to data mine a bunch of data, and use this. Google has their "Google Books", which is a record of an enormous amount of books, which has been done via data mining - of course, there's a difference between the Google Books project and AI art models, due to the end result (one is a collection of existing stuff, and the other is one that can create new stuff). But the focus here was on the data mining.

One thing that a lot of people don't seem to know: You do not own a style. You cannot copyright a style. There have been a lot of artists that complain because "it's possible for people to just mimic my work". But yes, that is true, but it has always been true - simply because you do not own "your" style. People have always been able to go to another person and say "please make some art, in the style of this person". You have copyright for individual piece of art, but not the general style that you use to create said art.

Here comes my own personal opinion:

Tools using AI are the future. People are not going to lose their jobs because an AI makes them obsolete - people are going to lose their jobs if they refuse to use AI to improve their workload.

Take software development. These models can generate code from the bottom to an insane degree of detail. You no longer have to spend time on all the boring stuff, actually writing the code, you can focus on the problemsolving. The same goes for art: with AI tools, you get to skip the boring monotonous part of your workload, and you can focus on the parts that actually mean something.

CallFromMargin t1_j6gxzgp wrote on January 30, 2023 at 6:14 AM

The "they re-create art" argument comes from a paper that is widely shared on Reddit. Thing is, that paper itself mentions that the researchers trained their own models on small data sized, ranging from 300 pictures to few thousand, and they started seeing novel results at 1000 images.

Also current bots can't generate good code, not yet, but they have their own usage. As an example, a client I recently had asked me to design patching system (small shop, with 100 or so servers, they had no use for automated patching up to now), and some simple automation. You know, the type of weekend jobs you do to earn some extra cash. Well, since they are using azure, I went with azure automation, but I had no idea how it works. Well, chatGPT told me how it works, in details, gave me some code that might work, etc. But the most important thing by far was the high level overview, it saved me hours of reading documentation. This shit is the future, but not how you might expect it to be.

Ronny_Jotten t1_j6i3uog wrote on January 30, 2023 at 2:25 PM

I don't know what paper you're referring to, but there's this one:

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

It clearly shows, at the top of the first page, the full Stable Diffusion model, trained on billions of LAION images, replicating images that are clearly "substantially similar" copyright violations of its training data. The paper cites several other papers regarding the ability of large models to memorize their inputs.

It may be possible to tweak the generation algorithm to no longer output such similar images, but it's clear that they are still present in the trained model network.

Mr_ToDo t1_j6j481z wrote on January 30, 2023 at 6:23 PM

Well, they did both in that paper. But it would be interesting to know what the ones at the top were from. I know that there's one I saw further down in high hit percents further down but with as nice as they are I don't know why the rest don't if they belong to that model.

Ronny_Jotten t1_j6kjrlv wrote on January 30, 2023 at 11:50 PM

The paper explains what the ones at the top were from. It's using Stable Diffusion 1.4. See page 7: Case Study: Stable Diffusion, page 14: C. Stable Diffusion settings, and page 15 for the prompts and match captions. Sorry, the rest of your comment is incomprehensible to me...

Mr_ToDo t1_j6mwtay wrote on January 31, 2023 at 1:50 PM

OK that's on me. I hit the references and somehow thought I was done with the paper, I didn't think they would have the captions they used underneath that. I admit that was on my bad due diligence. Apologies

Ronny_Jotten t1_j6hpnnj wrote on January 30, 2023 at 12:19 PM

> They don't store any original art used in the training [...] these models do not replicate the art it has been trained on. Every single piece of art generated by AI, is something entirely new. Something that has never been seen before. You can debate if it takes skill, but you can't debate that it's something new

They can very easily reproduce images and text that are substantially similar to the training input, to the extent that it is clearly a copyright violation.

Image-generating AI can copy and paste from training data, raising IP concerns | TechCrunch

> courts have indeed shown previously that Google IS allowed to data mine a bunch of data [...] there's a difference [...] But the focus here was on the data mining.

In the case of the Google Books search product, the scanning of copyrighted works ("data mining") was found to be fair use. That absoutely does not mean that all data mining is fair use. Importantly, it was found that it had no economic impact on the market for the actual books, it did not replace the books. In order for the code/text/image AI generators' "data mining" of copyrighted works to be fair use, it will also have to meet that test. Otherwise, the mining is a copyright violation.