firem1ndr t1_j5lmpuf wrote on January 23, 2023 at 9:47 PM

yeah that’s basically how all these “ai” work - what nobody’s considering with all this ai writing stuff is that they’re scraping from existing writing, if everything becomes ai writing then it’s just a cascade of machine plagiarism, at some point in the process someone has to actually acquire knowledge and expertise and form opinions to write out an argument

greenappletree t1_j5lqou1 wrote on January 23, 2023 at 10:13 PM

This is going to be an interesting problem - just today I heard that chatGPT when as to code something was just basically scraping from GitHub. At what point does an AI infringe in copyright and who is responsible. Developers are just going to shrug and say the ai is a black box.

ciarenni t1_j5m0jn5 wrote on January 23, 2023 at 11:19 PM

> I heard that chatGPT when as to code something was just basically scraping from GitHub. At what point does an AI infringe in copyright and who is responsible.

Microsoft has already done this. Here's the short version.

A few years back, Microsoft bought GitHub. Repositories on GitHub have a license, specified by the author, stating how they can be used. These licenses range from "lol, do whatever, I don't care, but don't expect any support from me", to something akin to standard copyright.

Microsoft also creates Visual Studio, a program for writing code with lots of niceties to help people develop more efficiently and easily than writing code in notepad.exe. A recent version of Visual Studio had a feature called "co-pilot" which will basically read the half-built code you have and use some machine learning to offer suggestions.

Now then, as an exercise for the reader, knowing that Microsoft owns GitHub and also Visual Studio, where do you think they got the data to train that ML model? If you guessed "from the code on GitHub", you'd be right! And bonus points if you followed up with "but wait, surely they only used code they were allowed to based on the license specified?" Hint: No. It's literally plagiarism.

Nebuli2 t1_j5pwvqt wrote on January 24, 2023 at 7:15 PM

Yep, so they just let you know that they pass off any responsibility for infringing on licenses to you, the user.

Mgrecord t1_j5luylm wrote on January 23, 2023 at 10:41 PM

I believe there’s already a lawsuit against Dall-e and it’s use of copyrighted artwork.

LAwLzaWU1A t1_j5nns9q wrote on January 24, 2023 at 7:42 AM

Also worth pointing out that it's done by an organization that represent companies like Disney.

My guess is that massive companies like Disney are very interested in setting precedence that if their pictures are used for learning, they deserve payment. They will have their own datasets to train their AI on anyway, so they will still be able to use it.

These types of lawsuits will only serve to harm individuals and small companies, while giving massive companies a big advantage.

natepriv22 t1_j5ntkrw wrote on January 24, 2023 at 9:01 AM

No, they just want to use these models for their own profit, while making fan art generation or creation illegal.

They know they can't stop their pictures as used for learning, because they're publicly available. There's legal precedent for this.

What they care about is that you can generate an iron man style picture and post it online, without their licensing for such a character.

What's ironic is that this lawsuit will fail anyways, even with corporate backing, as I just mentioned, it can't generate any exact pictures, but only "style like" pictures.

Mgrecord t1_j5o513j wrote on January 24, 2023 at 11:39 AM

But isn’t the “style” or “essence” what’s actually copyrighted? I’m not sure Fair Use will cover this.

natepriv22 t1_j5o72qn wrote on January 24, 2023 at 12:03 PM

No, the final output is what's copyrighted. It's impossible to copyright style because it's too much of an abstract.

Example: Disney copyrights drawings of Mickey Mouse. Mickey mouse is a character that resembles a mouse, walks upright, has little mouse ears, has a boopy nose, red pants, and yellow shows.

This is a character, that Disney has come up with and which is unique. If someone were to draw something according to these exact specifications, then it is very likely that they would come up with a drawing closely or almost completely resembling Mickey Mouse. By trying to redistribute something so obviously similar, you are in danger of breaching someone's copyright.

On the other hand a style could be cartoons, or lets make it at the simplest level possible, drawing only with circles.

While you may have been the first to use a style, you have no copyright claim over it. It's a very abstract thing, but its more far removed from the artist. The style is a medium to produce a creation, it's more like a tool, but not the ultimate product. If you and Disney both started drawing with circles, you would ultimately come to very different products, no matter how similar the goal may be (draw a mouse using only circles).

In other words, styles are almost mathematical arrangements of colors, movements, dots, etc. You use this mathematical formula to produce a character for example. This character is unique, it's very likely only you could have come up with this. The style is very likely to be discovered by other people. Trying to copyright a style would be like trying to copyright a math formula.

TLDR: sorry for the messy writing, but I was trying to put all my thoughts together into one. For these reasons, AI can never truly plagiarize or infringe copyright on its own. Styles are non copyright able and that is almost exclusively what matters to the AI. Arranging math to try and satisfy your output desire. Unless it has a reference point it will pretty much never be able to come to the same conclusion you have come to.

Extra: imagine a world where style is copyrighted instead of just the product or output. It would be the destruction of creativity and art. Imagine if Disney was able to smartly copyright a cartoon or 3d cartoon style. They would be the only ones able to create cartoons and 3d cartoons in the industry, gatekeeping and locking everyone else out for risk of lawsuits.

Now that would be a true dystopia...

natepriv22 t1_j5o7jzl wrote on January 24, 2023 at 12:08 PM

Just to add:

If I made it really confusing by being all over the place:

Style = like math, discovered

Art product or output = like an idea, invented

Creativity combines the use of a style, to produce a product or output that expresses something. Without the product or output, what can a style express?

Imagine trying to explain Van Goghs style and styles without his product or output. It would be very mathematical and scientific = turbulent lines + bright colors + lowering of clarity filter

M_Mich t1_j5ork5f wrote on January 24, 2023 at 2:56 PM

and if expressionism could be copywrited, it wouldn’t have become a style of are. it would have been limited to the first artist to do it and then everyone else would have been sued.

Mgrecord t1_j5o8cl6 wrote on January 24, 2023 at 12:17 PM

Thanks for the thoughtful explanation. It will be interesting to see how this plays out. The technology is going to move much faster than the lawsuits!

Fafniiiir t1_j5wym5z wrote on January 26, 2023 at 3:14 AM

Human beings are not ai, I don't think that the two can just be compared.
A human being being influenced by another artist is not the same as an ai, and a human being can't copy another artist as accurately, broadly and quickly as an ai can.

Even if you practice Van Goghs work your entire life your work will never actually look like his there will always be noticeable differences.
There's a lot of artists who even do try to directly copy other artists styles and it's always very apparent and like a worse copycat.

The problem with ai too which is unique to it compared to humans is that it can be fed with an artists work and spit out finished illustrations in that style in seconds.
What is the point of hiring the artist who's work was input into the ai for it to learn from it?
The artist is essentially being competed out of their own work with no way of combating it or keeping up with it.
Not to mention that it also competes them out of their own search tag, when you search for some artists you literally get page after page of ai generations instead of the actual artists work.

Things like fair use take this stuff into consideration too, the damages or even potential damages caused to the person.
And ai is fundamentally different than humans in this regard, another human artist can never do what an ai and can't be judged the same.

natepriv22 t1_j5xrs8o wrote on January 26, 2023 at 8:01 AM

>Human beings are not ai, I don't think that the two can just be compared.

Absolutely they can be compared though, they are two forms of intelligence, one of those is built on the principles of intelligence of the other.

>A human being being influenced by another artist is not the same as an ai, and a human being can't copy another artist as accurately, broadly and quickly as an ai can.

It's not the exact same sure, but its broadly similar. You don't store 100% of the info you learn and see because it would be too much data. So you remember processes, rules, and outcomes much better, just like an AI would.

>Even if you practice Van Goghs work your entire life your work will never actually look like his there will always be noticeable differences. There's a lot of artists who even do try to directly copy other artists styles and it's always very apparent and like a worse copycat.

I mean, the average person and I'm pretty sure the both of us too would not be able to distinguish the original from the copied one, unless we had more info. You can do a simple test online, and let's see if you manage to distinguish the two. If you do get a high score, then congrats! You are better at spotting copied art than the average human is.

Furthermore, what you're describing is exactly how AI works. Unless you use an Img2Img model, which is not what the majority of AI art is, then you would never, or it would be close to impossible for you to produce the same output, just like a human. Again, you could test this right now. Just go on an AI art app like Midjourney or Stable Diffusion, and type in "Van Gogh Starry Night", let's see what outputs you will get out of this.

>it can be fed with an artists work and spit out finished illustrations in that style in seconds.

First of all not exaclty, as I've said before, the model never contains the original input, so it's only learning the process, like a human.

Second of all, you can do the same thing! It'll just take you more time. Your friend gives you 100 pictures of a new art style called "circly" which is art purely made with circles. He will give you days, weeks or months, however much you need, to output something in this new style. He wants a picture of New York only made with circles. So you learn this style and create the new drawing or painting for him. You did almost the exact same thing an AI did, except it took you longer which is normal as a human being.

>What is the point of hiring the artist who's work was input into the ai for it to learn from it?

What is the point of hiring a horse carriage driver, when the concept of how a carriage works, was used to create the "evil car"?

First this is a loaded and emotional question. All kinds of art was used without discrimination, no one was specially selected.

Secondly, again, the model will not be able to output the same thing. It can draw in the same style, but the output will not be the same, it just mathematically won't be. So there is economic value in the original work too.

If a process or job can be automated, and there can be a benefit for humanity, why should we stop this development. Where were you when the horse carriage was being replaced? Where are you, fast food workers are getting automated too?? Why is it ok for others but not for you? And if it's ok for no one, do you think we should regress and go back in the past?

>Not to mention that it also competes them out of their own search tag,

I literally have never met a person who searches the art from someone outside of their official channels. Even if they do, then that's a marketing challenge. But what's the difference with popular artists that were being flooded with copies from fiver then?

A style is copyrightable btw, and thank gosh for that. So if they're getting flooded with "copies of their style" that's a lie. It's not their style, it's the style they use and maybe even discovered. But they have no copyright claim. Imagine a world where Disney could copyright drawing cartoonish styles... or DC comic styles... is that what you want?

LAwLzaWU1A t1_j5p7pc7 wrote on January 24, 2023 at 4:41 PM

Making it illegal to use pictures for learning, even if publicly available, is exactly what the lawsuits are about, and a huge portion of people (mainly artists who have had their art used for learning) support this idea.

It's in my opinion very stupid, but that's what a lot of people are asking for without even realizing the consequences if such a system was put in place (not that it can be to begin with).

Fafniiiir t1_j5wxvop wrote on January 26, 2023 at 3:08 AM

This isn't really true at all, artists don't have a problem with art being used to teach ai so long as it's consensual and artists get compensated for it.

LAwLzaWU1A t1_j5xtymw wrote on January 26, 2023 at 8:30 AM

And the consequence of that is that Disney could say that artists who used Disney works to learn how to draw without consent owe them royalties. I don't think that is what is going to happen, but logically that is the implication.

If you go through some of the lawsuits being done regarding AI you will see that what they are arguing is not exclusive to AI art tools. For example, the lawsuit from Getty seems to just states that it should be considered illegal to "use the intellectual property of others - absent permission or consideration - to build a commercial offering of their own financial benefit".

That wording applies to human artists as well, not just AI. Did you use someone else's intellectual property to build a financial offering, such as artists on fiverr advertising that they will "draw X in the style of Disney"? Then you might be affected by the outcome of this lawsuit, even if you don't use AI art tools. Hell, does your drawings draw inspiration from Disney? Then you have most likely used Disney as "training data" for your own craft as well and it could therefore be argued that these rulings apply to you as well.

I understand that artists are mainly focused on AI tools, but since an AI tool in many ways functions like a human (see publicly available data and learns from it), these lawsuits could affect human artists too.

And like I said earlier, the small artists who are worried that big companies might use AI tools instead of recruiting them are completely missing the mark with these lawsuits, because the big companies will be able to afford to buy and train on their own datasets. Disney have no problem getting the legal right to train their future AI on whichever data they want. These lawsuits will only harm individuals and small companies by making it harder for them to match the AI capabilities of big companies.

It is my firm belief that these tools have to be as open and free to use by anyone as possible, in order to ensure that massive companies don't get an even bigger advantage over everyone else. At the end of the day, the big companies currently suing companies like StabilityAI are doing so for their own personal gains. Getty images don't want people to be able to generate their own "stock images" because that's their entire business. Disney doesn't want the average Joe to be able to recreate their characters and movies with ease. They want to keep that ability to themselves.

Fafniiiir t1_j5wxknu wrote on January 26, 2023 at 3:06 AM

>There's legal precedent for this.

I think that people are getting ahead of themselves when making these claims, this is very new legal issues.
Context matters a lot here, and laws adapt to new technology or context all the time.

natepriv22 t1_j5xpjyp wrote on January 26, 2023 at 7:32 AM

No there's actual legal precedent which shows this.

https://en.m.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.

Fafniiiir t1_j5wxb0j wrote on January 26, 2023 at 3:04 AM

Imo I think it's a really creepy and worrisome precedent to set that they can just scrape everything they want.
A lot of very bad stuff has been found in these dataset including cp and stuff like isis footage, revenge porn and leaked nudes etc.
Even on a less horrifying note, it's also peoples personal photographs, medical records and before and after photos of operations, peoples private vacations, family photos, id's etc you get the idea.

I do find it a bit worrisome if they can just scrape everything they want online and use it for commercial purposes like this.
At least Disney using their own copyrighted work to train an ai wouldn't run into these ethical problems.

LAwLzaWU1A t1_j5xvmyt wrote on January 26, 2023 at 8:53 AM

I genuinely do not understand why you find that creepy and worrisome. We have allowed humans to do the exact same thing since the beginning of art, yet it seems like it is only an issue when an AI does it. Is it just that people have been unaware of it before and now that people realize how the world works they react to it?

If you have ever commissioned an artist to draw something for you, would you suddenly find it creepy and worrisome if you knew that said artist had once seen an ISIS video on the news? Because seeing that ISIS video on the news did alter how the artist's brain was wired, and could potentially have influenced how they drew your picture in some way (maybe a lot, maybe just 0,0001%, depending on what picture you asked them to draw).

The general advice is that if you don't want someone to see your private vacation photos, don't upload them to public websites for everyone to see. These training data sets like LAION did not hack into peoples' phones and steal the pictures. The pictures ended up in LAION because they were posted to the public web where anyone could see them. This advice was true before AI tools were invented, and it will be true in the future as well. If you don't want someone to see your picture then don't post it on the public web.

Also, there would be ethical problems even if we limited this to just massive corporations. I mean, first of all, it's ridiculous to say "we should limit this technology to massive corporations because they will behave ethically". I mean, come on.

But secondly and more importantly, about companies that don't produce their own content to train their AI on, but rather would rely on user submitted content? If Facebook and Instagram included a clause that said that they were allowed to train their AI models on images submitted, do you think people would stop using Facebook? Hell, for all I know they might already have a clause allowing them to do this. I doubt many people are actually aware of what they allow or don't allow in the terms of service they agree to when signing up for websites.

Edit:

It is also important to understand the amount of data that goes into these models and data sets. LAION-5B consists of 5,85 million images. That is a number so large that it is near impossible for a human to even comprehend it. Here is a good quick and easy visualization of one what one billion is. And here is a longer and more stark visualization because the first video actually uses 100,000 dollars as the "base unit", which by itself is almost too big for humans to comprehend.

Even if someone were to find 1 million images of revenge porn or whatever in the dataset, that's still just 0.02% of the data set, which in and of itself is not the same as 0.02% of the final model produced by the training. We're talking about a million images maybe affecting the output by 0.02%.

How much inspiration does a human draw from the works they have seen? Do we give humans a pass just because we can't quantify how much influence a human artist drew from any particular thing they have seen and experienced?

I also think the scale of these data sets brings up another point. What would a proposed royalty structure even look like? Does an artist which had 100 of their images included in the data set get 100/5,000,000,000 of a dollar (0.000002% of a dollar)? That also assumes that their works actually contributed to the final model in an amount that matches the portion of images in the data set. LAION-5B is 240TB large, and a model trained on it would be ~4GB. 99.99833% of all data is removed when transforming from training data to data model.

How to we accurately calculate the amount of influence you had on the final model which is 0.001% the size of the data set, of which you contributed 0.000002% to? Not to mention that these AIs might create internal models within themselves, which would further diminish the percentages.

Are you owed 0.000002% of 0.001%? And that also assumes that the user of the program accounts for none of the contributions either.

It's utterly ridiculous. These things are being discussed by people who have no understanding of how any of it works, and it really shows.

DigitalSteven1 t1_j5nf5ox wrote on January 24, 2023 at 5:58 AM

As a developer, I couldn't care less that my github was scraped. I also don't care that github made copilot. It's a tool for us. I've been developing for years and copilot has made my life work flow significantly better. Ask the wide developer community, and you'll find very similar results. We just don't care that much that our work patterns may repeat in other code somewhere.

According to GitHub's own surveys:

88% reported being more productive
>90% reported being faster at their job
60-75% of users reported feeling more fulfilled and less frustrated
73% reported it helped them "stay in the flow"
87% reported it preserved mental effort from repetitive tasks

Source: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/

But if you really want to know, just go to some developer subreddit and ask...

Fake_William_Shatner t1_j5lwqbw wrote on January 23, 2023 at 10:53 PM

No -- in the case of code, it's not "distilling a style" -- it's grabbing whole routines of code that someone wrote with certain attribution and copy restrictions that I think GPT and some "code AI" are breaking.

There's no point in breaking up an entire function -- so it is probably more like automated cut and paste.

Imnot_your_buddy_guy t1_j5mr8nz wrote on January 24, 2023 at 2:31 AM

Shouldn’t we be demanding that the majority of these AI be free considering their companies just steal from our shared knowledge?

greenappletree t1_j5mu4ot wrote on January 24, 2023 at 2:52 AM

This is an interesting point but to play devil advocate couldn’t the same be said about a person who is learning from all these material for free, assimilated it and made it their own?

Key-Passenger-2020 t1_j5myi30 wrote on January 24, 2023 at 3:26 AM

It depends on how that code is licensed. Much of it exists under the GNU Public License

GTREast t1_j5o2cvh wrote on January 24, 2023 at 11:06 AM

The AI base of content itself will grow and become a part of the information landscape, in a kind of feedback loop. This is going to get interesting.

ViennettaLurker t1_j5os9nt wrote on January 24, 2023 at 3:01 PM

> At what point does an AI infringe in copyright and who is responsible

Theres the philosophical answer, and the real world answer. We could talk theory all day, but this will all shake out when one gigantic corporation sues another gigantic corporation over it.

ericisshort t1_j5ltakd wrote on January 23, 2023 at 10:30 PM

You’re probably right, but I really don’t think that shrug will hold up in court though.

natepriv22 t1_j5ntbw0 wrote on January 24, 2023 at 8:58 AM

No that's basically how none of these AIs work. You don't understand how machine learning works. Please stop spreading misinformation and do some research first.

If the AI is plagiarizing then so are you in writing your comment, as you sure as heck didn't just learn to write out of the blue.

The model never contains the original text, can you imagine how huge that would be? Nobody would be able to run it and def nobody would have enough money to access it. The model uses a noise and denoising algorithm, and a discriminator algorithm to make sure the output is the most likely correct output.

So its literally not possible for it to commit plagiarism because it doesn't contain the og text. For it to be accidental plagiarism, it would have to accidentally generate the exact same output, with no memory of the original input, except for an idea of turning noise into comprehensible text.

To put it in other words, that would be like you writing a paragraph that is word for word a copy of someone's else's paragraph, without you ever having any memory of said paragraph, except for a vague idea of how to turn a bunch of random words into comprehensible text. The chances are slim or next to mathematically impossible.

Furthermore, these models almost all dont have access to the internet, especially not chatgpt or gpt3. It's explicitly stated that the data cutoff is 2021, so it has not even been trained on newer articles.

The most likely explanation therefore is that CNET employees were really lazy or naive, and literally copy and pasted the other articles text into chatgpt or gpt3, and then wrote simple prompts for it such as "reword this for me". That's the true issue. I know that it's most likely the case because I've tried to reword text a few times with chatgpt, and sometimes it just doesn't manage to find a way to properly remix the text without making it sound too similar to the original. This only happens when I feed the text word for word, and I use a very lazy prompt. When I make a more complicated prompt, it's able to summarize the text and avoid copying it, just like a human would if they were asked to summarize a text.

So this is what's going on, not other things. Knowing reddit, even with this explanation it's unlikely that people are gonna believe me and will be unwilling to do their own research. If you wanna prove me wrong, here's a challenge. Make it generate an article about anything you like. Now copy and paste elements of that paragraph in Google search, and see how many exact results come up.

Shiningc t1_j5qibie wrote on January 24, 2023 at 9:25 PM

That doesn’t contradict his claim that “AI is just scraping existing writing”. Human intelligence doesn’t work in the same way. It’s just that at some point, humans know that something “makes sense” or “looks good”, even if it’s something that’s completely new, which is something that the current “AI” cannot do.

natepriv22 t1_j5qmutp wrote on January 24, 2023 at 9:53 PM

It does though...

It's not scraping writing, it's learning the nuances and rules and the probabilities of it in the same way a human would.

The equivalent example would be if a teacher tells you "write a compare and contrast paragraph about x topic". The process of using existing understanding, knowledge and experience is very similar on a general level to current LLM AIs. There's a reason they are called Neural Networks... who and what do you think they are modeled after currently?

Shiningc t1_j5qp1vn wrote on January 24, 2023 at 10:07 PM

“Comparing and contrasting paragraphs” has an extremely limited scope and it’s not a general intelligence.

An AI doesn’t know something “makes sense” or “looks good” because those are subjective experiences that we have yet to understand how it works. And what “makes sense” to us is a subjective experience where it has no guarantee that it actually does objectively make sense. What made sense to us 100 years ago may be complete nonsense today or tomorrow.

If 1000 humans are playing around with 1000 random generators, humans can eventually figure out what is “gibberish” and what might “make sense” or “sound good”.

Shiningc t1_j5nshxm wrote on January 24, 2023 at 8:46 AM

"But but but that's how human intelligence works!"

UniversalMomentum t1_j5ou5yb wrote on January 24, 2023 at 3:14 PM

Yeah.. but the machine learning is only just getting useful so you're kind of projecting the limitations of a new tech long term as if the tech won't be changing and it probably will change and it probably will be able to go well beyond just combining pre-made content.

That being said all human knowledge is plagiarized from the past, that's the inante foundational kind of process of science and knowledge. We aren't all supposed to figure everything out on our own so much as steal the success of the past as fast as possible and apply it somehow. You're not suppose to re-invent the wheel, you're supposed to copy it and find smart uses for it.

Sometimes 'acquiring knowledge' just means organizing the data so you see the patterns, in fact I'd say the majority of the time. AI is going to be pretty darn good at that and the limits we see now are rather expected vs you should project today's limitations decades into the future as if the tech will be standing still. People do that far too often, they speculate all the negatives and almost gleefully ignore the positives. It skews humans ability to long term project quite a lot.

QuestionableAI t1_j5lphpu wrote on January 23, 2023 at 10:05 PM

Sue the shit out of them each and every time ... I know I will.

Seriously, I have original works out there in articles and books and if I or anyone else finds my works being STOLEN for use by any MFer, my attorney has been wanting a new boat...

[deleted] t1_j5lwmk3 wrote on January 23, 2023 at 10:52 PM

[removed]

Spikemountain t1_j5mry80 wrote on January 24, 2023 at 2:36 AM

And don't forget a word in all caps for emphasis. Karen AI written all over it

QuestionableAI t1_j5mul1e wrote on January 24, 2023 at 2:56 AM

Learned it all on the intertubes..:)