FrowntownPitt t1_jeanox9 wrote on March 30, 2023 at 5:09 PM

Just because something is free to access doesn't mean you have the right to do whatever you want with it, especially with regards to making derivative works without attribution or otherwise breaking license terms. This is what licenses and copyrights are for.

For example, if OpenAI scraped a code repository that uses a Creative Commons NonCommercial license and is using that code for monetary gain without the owner's consent, they're breaking that license. It'd have to be argued whether the fact that OpenAI used that code to train their models which may generate code to similar likeness counts as distributing the source, and whether having a user use that model under a paid service counts as a commercial violation of those terms.

The algorithm is IP, yes. But GPT-X is part model part training data.

YummyMummy2024 t1_jeaoukd wrote on March 30, 2023 at 5:16 PM

No doubt those licensing were ignored but without evidence how do you make that copyright claim? Without evidence does that make it derivative? What do you think?

FrowntownPitt t1_jeaq30n wrote on March 30, 2023 at 5:24 PM

I mean yeah I agree, enforcing something like this is going to be very very difficult. But there are several clear examples of something like DallE generating images very similar to or nearly identical to copyrighted IP.

IANAL, but I presume a claimant could be able to establish some reasonable certainty to a court that licensed works were used in a way that breaks the license, at which point OpenAI (or really any AI company) would be responsible for defending their practice or non-use of those licensed works

CalvinKleinKinda t1_jeb8blj wrote on March 30, 2023 at 7:20 PM

"Generating" literal smudged watermarks from copyrighted content.

ShadoWolf t1_jeb05vb wrote on March 30, 2023 at 6:28 PM

You signed over your rights to your content . when you signed up to reddit, or facebook, or google.

It's not like OpenAI is using some shoestring budget web scrapper using python and the beautifulsoup library.

They have partnerships .. and requested the raw text data.

Particular-Way-8669 t1_jebcnyt wrote on March 30, 2023 at 7:48 PM

You signed off those rights away to these sites. Not to OpenAI lol. It is still your IP. You can not go and copy it because you posted it there because you Will be hit with infrigement law suit. Reddit, Facebook, Google received your permission to use it in certain way. And yes Google or Facebook can potentionally claim it used those data fairly for their models. OpenAI? Not a chance.

ShadoWolf t1_jebhlhj wrote on March 30, 2023 at 8:19 PM

unfortunately your wrong:

Your Content

The Services may contain information, text, links, graphics, photos, videos, audio, streams, or other materials (“Content”), including Content created with or submitted to the Services by you or through your Account (“Your Content”). We take no responsibility for and we do not expressly or implicitly endorse, support, or guarantee the completeness, truthfulness, accuracy, or reliability of any of Your Content.

By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.

You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

Any ideas, suggestions, and feedback about Reddit or our Services that you provide to us are entirely voluntary, and you agree that Reddit may use such ideas, suggestions, and feedback without compensation or obligation to you.

Although we have no obligation to screen, edit, or monitor Your Content, we may, in our sole discretion, delete or remove Your Content at any time and for any reason, including for violating these Terms, violating our Content Policy, or if you otherwise create or are likely to create liability for us.

Particular-Way-8669 t1_jebiwfy wrote on March 30, 2023 at 8:27 PM

Why do you even bother copying something without reading it?

"You retrain any ownership rights..."

End of Story, I am right. It says exactly what I said it did. You grant Reddit (and only Reddit) rights to manipulate with your content as written in TOS. You do not grant it to anyone else. If Reddit partners with someone then they would also be included if Reddit gave them that right. But this is not what happened. OpenAI scrapped internet. There was no partnership with reddit or anyone whatsoever.

ShadoWolf t1_jebs6px wrote on March 30, 2023 at 9:26 PM

You retain ownership... but you more or less signed over all right in what they can do with said information... it right there in the highlighted text.

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit << this is the part that lets them hand it over to companies like OpenAI

Particular-Way-8669 t1_jebu396 wrote on March 30, 2023 at 9:39 PM

Again if Reddit trained their own AI on user's data or gave that data to openAI as part of contract then you would have the point. But this is not what happened. OpenAI did not ask anyone. They run data crawling scripts and stole data without asking. It is nothing like what Reddit is doing. You did not sign anything off to OpenAI.

johndburger t1_jebyej1 wrote on March 30, 2023 at 10:08 PM

> this is the part that lets them hand it over to companies like OpenAI

Your claim is that OpenAI has negotiated usage rights from every single site it’s gotten data from? Do you have any evidence for this?

khamelean t1_jeaogv3 wrote on March 30, 2023 at 5:14 PM

Just because something is free to access, doesn’t mean you are allowed to remember it or learn from it any way!!

AbeWasHereAgain t1_jeat78b wrote on March 30, 2023 at 5:44 PM

Go ask Vanilla Ice what happens when your music sounds a little too close to the original.

OpenAI, and Microsoft, are 100% violating terms of use for the vast majority of the stuff they scraped.

khamelean t1_jeautqo wrote on March 30, 2023 at 5:54 PM

All musicians learn from hearing other music.

There is a difference between learning and copying.

No_Character_8662 t1_jeaxen9 wrote on March 30, 2023 at 6:10 PM

So if I call something in my process "learning" I'm free to use it? I'm learning copies of your works on my printer to sell right now

Edit: to be clear I don't know what the answer is but that seems simplistic

Numai_theOnlyOne t1_jeb9lah wrote on March 30, 2023 at 7:28 PM

Tbh can we separate human learning with AI learning?

A human is a biological imperfect being that require time and repetition to learn.

AI needs just a large pool of data and can the same as millions of humans in a fracture of the time required.

I think that's not the same learning, and a thing that honestly should be questioned, after all our content was created with humans in mind and not meant to been used for ai.

khamelean t1_jeaztbb wrote on March 30, 2023 at 6:26 PM

The learning isn’t the problem, the selling is.

Newfondahloose t1_jeb49yx wrote on March 30, 2023 at 6:54 PM

They are selling their own work. There’s only so many ways you can answer a question. Just because you’ve answered the question before, doesn’t mean someone else can’t come to the same conclusion when answering for themselves.

AbeWasHereAgain t1_jeaxmj7 wrote on March 30, 2023 at 6:12 PM

lol - you don't think ChatGPT is spitting out insanely close replicas of other peoples work daily?

khamelean t1_jeazo62 wrote on March 30, 2023 at 6:25 PM

Nothing wrong with playing/singing other people’s songs, I sing along to the radio in my car all the time.

AbeWasHereAgain t1_jeazzbi wrote on March 30, 2023 at 6:27 PM

ha ha ha - yeah, totally the same thing. Just an FYI, artists are required to pay when they do a cover.

Everything changes when you start making money off other peoples work.

khamelean t1_jeb0muo wrote on March 30, 2023 at 6:31 PM

That’s exactly my point.

AbeWasHereAgain t1_jeb0qkx wrote on March 30, 2023 at 6:32 PM

What is your point?

khamelean t1_jeb2qq4 wrote on March 30, 2023 at 6:44 PM

It’s not a problem until you start making money off other peoples work.

Space_Pirate_R t1_jeb6yrh wrote on March 30, 2023 at 7:11 PM

Are monetized AI artists paying royalties to everyone whose art was scraped off the web?

khamelean t1_jebg7rh wrote on March 30, 2023 at 8:10 PM

Are human artist paying royalties to everyone who’s art they scraped off the web??

[deleted] t1_jebh53j wrote on March 30, 2023 at 8:16 PM

[deleted]

Space_Pirate_R t1_jebi0au wrote on March 30, 2023 at 8:21 PM

Human artists learning from others' work is obviously "fair use." I don't think a corporation will successfully deploy that in defense of training a commercial AI.

khamelean t1_jebj0yq wrote on March 30, 2023 at 8:28 PM

Just looking at a piece of art is enough to encode it into a human’s neural network. Why should it be any different for an artificial neural network? If it’s free to access then it’s free to access.

Space_Pirate_R t1_jebovpk wrote on March 30, 2023 at 9:05 PM

I don't believe that an artificial neural network is morally or legally equivalent to a human. If I did believe that, then there would be more pressing issues than copyright infringement to deal with, such as corporate enslavement of AIs.

khamelean t1_jebrxm4 wrote on March 30, 2023 at 9:25 PM

What does moral or legal equivalence to humans have to do with anything?

The point is that all AI has to do to learn from art is look at it. If someone makes their art free to look at, then it’s free for an AI to look at.

Space_Pirate_R t1_jebta1s wrote on March 30, 2023 at 9:34 PM

AIs don't have agency. The AI is a tool which is being operated by a corporate entity. The corporate entity is governed by existing laws, and requires a license to use a copyright work in the operation of their business.

khamelean t1_jebxbzr wrote on March 30, 2023 at 10:01 PM

So companies have to pay a licensing fee to every artist who’s work that employees of that company have ever looked at?? Yeah, I don’t think that’s how it works.

[deleted] t1_jebz2w2 wrote on March 30, 2023 at 10:13 PM

[deleted]

Space_Pirate_R t1_jebzo2e wrote on March 30, 2023 at 10:17 PM

No, because (as I mentioned earlier) there is a fair use exemption which allows humans to be educated using copyright works. However, there is no such exemption allowing corporations to train AI using copyright works.

khamelean t1_jec1fmg wrote on March 30, 2023 at 10:29 PM

Education is irrelevant in this context. The copyrighted works people consume through education is a tiny fraction of the total number of copyrighted works that most people experience through their lives. And all of those experiences contribute to that person’s capabilities.

The exemption for education’s purposes is for presenting copyright material to students in an education setting. It has nothing to do with copyright work that the student might seek out themselves.

Space_Pirate_R t1_jec5lal wrote on March 30, 2023 at 10:59 PM

Yes, humans experience copyright works and learn from them, and that's fair use. What does that have to do with training an AI?

A person or corporation training an AI is covered by normal copyright law, which requires a license to use the work.

khamelean t1_jec837g wrote on March 30, 2023 at 11:16 PM

How is it any different to an employee “using” the work? Corporations don’t pay licensing when an employee gets inspired by a movie they saw last night.

Why do you keep mentioning corporations? An AI could just as easily be trained by an individual. I’ve written and trained a few myself.

Space_Pirate_R t1_jec8j1p wrote on March 30, 2023 at 11:20 PM

>Corporations don’t pay licensing when an employee gets inspired by a movie they saw last night.

The employee themselves paid to view the movie. The copyright owner set the amount of compensation knowing that the employee could retain and use the knowledge gained. No more compensation is due. This is nothing like a person or corporate entity using unlicensed copyright works to train an AI.

>Why do you keep mentioning corporations? An AI could just as easily be trained by an individual. I’ve written and trained a few myself.

Me too. I keep saying "person or corporation training an AI" to remind us that the law (and any moral judgement) applies to the person or corporate entity conducting the training, not to the AI per se, because the AI is merely a tool and is without agency of its own.

khamelean t1_jecbi7y wrote on March 30, 2023 at 11:41 PM

“What does that have to do with a person or corporate entity training an ai?”

Training a human neural network is analogous to training an artificial neural network.

Whether the employee paid to watch a movie doesn’t matter, they could have just as easily watch something distributed for free. The transaction to consume the content is, as you said irrelevant to the corporation.

An AI consuming a copyright work is no different to a human consuming a copyright work. If that work is provided for free consumption, why would the owner of the AI have to pay for the AI to consume it?

[deleted] t1_jeceuu2 wrote on March 31, 2023 at 12:06 AM

[deleted]

Space_Pirate_R t1_jecfcfy wrote on March 31, 2023 at 12:09 AM

>Training a human neural network is analogous to training an artificial neural network.

By definition, something analogous is similar but not the same. Lots of things are analogous to others, but that doesn't even remotely imply that they should be governed by the same laws and morality.

>An AI consuming a copyright work is no different to a human consuming a copyright work.

A human consuming food is no different to a dog consuming food. Yet we have vastly different laws governing human food compared to dog food. Dogs and AI are not humans, and that is the difference.

>If that work is provided for free consumption, why would the owner of the AI have to pay for the AI to consume it?

If that work is provided for free consumption, why would the owner of a building have to compensate the copyright owner to print a large high quality copy and hang it on a public wall in the lobby? The answer is that the person (not the AI) is deriving some benefit (beyond fair use) from their use of the copyrighted work, and therefore the copyright owner should be compensated.

khamelean t1_jecru6d wrote on March 31, 2023 at 1:44 AM

The building owner is using a replication of the copyrighted work. The owner should absolutely compensate the original creator.

But the printing company that the building owner hires to print the poster doesn’t owe the original creator anything. Even though it is directly replicating copyrighted work, and certainly benefiting from doing so. If the printer were selling the copyrighted works directly then that would be a different matter and they would have to compensate the original copyright owner. So clearly context matters.

An AI doesn’t even make a replication of the original work as part of its training process.

If the AI then goes on to create a replication, or a new work that is similar enough to the original that copyright applied, and intended to use the work in a context where copyright would apply, then absolutely. That would constitute a breach of copyright.

It is the work itself that is copyrighted, not the knowledge/ability to create the work. It’s the knowledge of how to create the work which is encoded in the neural network.

Lots of people benefits from freely distributed content. Simply benefiting from it is not enough to justify requiring a license fee.

Hypothetically speaking, let’s say a few years down the line we have robot servants. I have a robotic care giver that assists me with mobility. Much as I may have a human care giver today.

If I go to the movies with my robot care giver, they will take up a seat so I would expect to pay for a ticket, just as I would for a human care giver. Do I then need to pay an extra licensing fee for the robots AI brain to actually watch the movie?

What if it’s a free screening? Should I still have to pay for the robot brain to “use” the movie?

Is the robot “using” the movie in some unique and distinct way compared to how I would be “using” the movie?

Newfondahloose t1_jeb3tmh wrote on March 30, 2023 at 6:51 PM

It’s learning and using language to answer questions. There’s only so many ways you can answer the same question. Greed getting in the way of progress, as always. Guess professors should give a citation every time they give a verbal answer even though they are answering from memory.

Particular-Way-8669 t1_jebcdng wrote on March 30, 2023 at 7:46 PM

There is difference between human that can be creative and using it for computer program that creates aggregations. Completely different thing. AI does not really learn. It adjusts its mathematical functions based on data.

khamelean t1_jebgej7 wrote on March 30, 2023 at 8:11 PM

No, there is no difference. Creativity is just combination and random mutation. It’s how humans are creative, it’s how machines are creative. It’s the same thing.

Particular-Way-8669 t1_jebh4n3 wrote on March 30, 2023 at 8:16 PM

This is utter bullshit. There was always some human that came up with something first. When there was nothing like that before. AI technology we know does not have this ability. And never will. It is only data aggregation, nothing else. Human does not need data from other humans to be creative and the very fact that there was someone who climbed off of trees and picked up first fire is proof of that.

khamelean t1_jebi03n wrote on March 30, 2023 at 8:21 PM

Combination + mutation. It allowed evolution through natural selection to give us every life form on earth. Creativity works exactly the same way.

johndburger t1_jeby23l wrote on March 30, 2023 at 10:06 PM

ChatGPT has “learned” some generalizations from the text that it’s processed, but it has also literally memorized (I.e. copied) billions of words from it.

khamelean t1_jec048i wrote on March 30, 2023 at 10:20 PM

Technically it remembers the relationships between words, those relationships are encoded in its neural network. It doesn’t just copy the text.

https://en.m.wikipedia.org/wiki/Transformer_(machine_learning_model)

Numai_theOnlyOne t1_jeb91dq wrote on March 30, 2023 at 7:25 PM

Yeah and it makes sense as human but I can see this being an issue with AI and how fast it can learn.

After all suddenly whatever I posted anywhere is used to generate revenue which was formerly targeted towards people for free to get response for free. AI though usually requires you to pay for it. So why shouldn't the pay me to use my data? Sure maybe there is someone that made money with my response, and I might buy any of there stuff that's fine because it was not only because of my input unlike AI which only works because of the data. Same with artists. They were posting stuff for free not to be used for free but to present their art and land a job. You can't also not just rip an image from the internet and use it in a commercial because "it was freely available on the internet".

thurken t1_jeayknu wrote on March 30, 2023 at 6:18 PM

We're talking about ethics here, not unethical legal loopholes

Newfondahloose t1_jeb4uk7 wrote on March 30, 2023 at 6:58 PM

Ethics are different for everyone. I find it unethical to hold back society just because you want to be referenced or given 5 cents for your shitty, regurgitated blog post.

thurken t1_jeb60db wrote on March 30, 2023 at 7:05 PM

That was kind of the opposite point. That OpenAI would have some nerves to be mad a google to use ChatGPT to generate training data when they used everyone's data to get training data.

Particular-Way-8669 t1_jebc3ih wrote on March 30, 2023 at 7:44 PM

Everything free to access that is not licensed under copyright friendly IP is by definition IP of the one who put it out. Even if you take picture and put it on Facebook it is your IP. Facebook might have TOS that says they have right to do certain things you post on their site. Sure. But you gave then permission by agreeing to it. OpenAI never received any permission from anyone. Period.

beingsubmitted t1_jecici1 wrote on March 31, 2023 at 12:31 AM

The algorithm is barely IP, and the data is the bigger part of it's success.

ChatGPT is a reinforcement learning tuned transformer. The ideas and architecture it's built on aren't proprietary. The specific parameters are, but that's not actually that important. The size and number of layers, for example. Most people in ai can make some assumptions. Probably ReLU, probably Adam, etc. Then there are different knobs you can twiddle and with some trial and error you dial it in.

The size and quality of your training data is way more important, and in the case of chatgpt, so is your compute power. Lots of people can design a system that big, it's as easy as it is to come up with big numbers, but training it takes a ton of compute power, which costs money, which is why just anyone hasn't already done it if it's so easy.

It should also be said that GPT is a bit of a surprise success. Before models this size, it was a big risk. You're gonna spend millions to train a model, and you won't know until it's done how good it will be.

Most advancements in AI are open source and public. Those all help advance the field, but at the same time, it's also about taking a bit of a risk, and waiting to see how it pans out before taking the next risk.

Also, there's transfer learning. If you spend a hundred million training a model, I can use your trained model and a fraction of the money to make my own .

It's like if you laboriously took painstaking measurements to figure out an exact kilogram and craft a 1kg weight. You didn't invent the kilogram, difficult as it was to make it. If I use yours to make my own, I'm not infringing on your IP.