ComplexColor t1_iuzt3zw wrote
I'm not very familiar with generative models, are there explicit or implicit "techniques" that would prevent the model from plagiarizing the training material? Otherwise it's seems rather problematic to claim copyright on what could be an existing piece of art.
I realize that the likelihood might be infinitesimal but after billions and billions of generations some unlikely but clearly plagiarized works could be produced.
Saytahri t1_iv0ikmm wrote
They made a blog post about this. They generated a lot of samples and checked for matches in the dataset, there were some, mostly very simple vector art that was duplicated many times over in the dataset.
They removed the duplications and then checked again, no matches.
disturbing_nickname t1_iv0jdjg wrote
Your comment helped me realize that scientists probably soon will prove that AI creates more original content than a human, by analyzing the creative product. Fascinating thought…
petseminary t1_iuzw2mv wrote
I'd go a step further and question how you can copyright these outputs if you don't own everything the model was trained on.
C0DASOON t1_iv047p4 wrote
The same way a human artist can copyright a piece of art they made after drawing inspiration from other peoples' art.
pdillis t1_iv0z284 wrote
I've been using AI/Neural networks since 2018 to make art and this is the argument that (very recently) has gained a lot of popularity in defense of AI art but baffles me the most. A human artist and a Neural Network are not the same: the NN is just a tool, that's why the user is still considered the artist. Giving human qualities to the NN, whenever convenient, is a detriment to the movement as a whole.
C0DASOON t1_iv1622m wrote
Stating that a model that uses existing art only to update its parameters should not need special permissions for being exposed to said art and drawing an analogy to how human artists do not need a permission to do so is not giving human qualities to a model, unless your argument is that the only reason humans don't need permission to view or take inspiration from art is because we're making a special exception for the acts of viewing and taking inspiration performed by human beings and that otherwise all exposure to art requires a permission from the copyright holder, which is just as stupid as the existence of copyright in the first place. You do not, and should not need a special permission to use art, or anything else, to update model parameters.
petseminary t1_iv1b96a wrote
AI does not draw inspiration. Seeing something and being inspired by it is human. Processing lots of photos of artworks to produce similar works rehashes that data in a fundamentally different way.
kaibee t1_iv1kckb wrote
>AI does not draw inspiration. Seeing something and being inspired by it is human. Processing lots of photos of artworks to produce similar works rehashes that data in a fundamentally different way.
So like, Stable Diffusion, the model is 4gb and can be reduced to 2gb without much loss in quality. It was trained on ~5 billion images. 1 gigabyte is a billion bytes. It is effectively doing something like, compressing a 512x512x3 byte image into just a single byte. This is transformative, so fair use is a valid defense, imo.
petseminary t1_iv1lbgu wrote
It ain't shit without all the human effort that went into creating the training data. To my displeasure, I think the law will see it your way, but I don't think people should be so flippant about marginalizing over so much human creative effort. I have no problem with acquiring the rights to photos to train image generators, because that's the true cost of these products. It has nothing to do with final file size.
kaibee t1_iv1rktu wrote
> It ain't shit without all the human effort that went into creating the training data. To my displeasure, I think the law will see it your way, but I don't think people should be so flippant about marginalizing over so much human creative effort. I have no problem with acquiring the rights to photos to train image generators, because that's the true cost of these products. It has nothing to do with final file size.
I'm not sure what you mean by 'marginalizing'. The contribution of the artists is valid and necessary. I know a lot the "common folk" in the SD community enjoy that some artists are upset by this whole thing, but like, I think on the whole the community is supportive of artists.
Though, I do have another angle here: Copyright is absolutely out of control and the vast majority of it at this point is accruing for the benefit of Disney as a result of lobbying on behalf of Disney and others. I think it is fundamentally absurd that children can grow up with beloved characters and die of old age before the copyright on those characters expires. And that's kind of the whole issue here right? Like, if artists wanted a 20 year copyright term on something, I think that is good and reasonable. They should be able to exclude their images from training data. I'd even be in favor of going as far as to say that there should be some associated metadata to facilitate that and that the government should enforce compliance, artists should be able to sue, etc the whole 9 yards.
But lets even say we keep copyright as it is: death of the author + whatever number of decades. Even if you could enforce the law (I can't even imagine how you would, especially in the coming years), all this does is push the problem for artists out until either models get better at learning from less data (so that you can make do with the far more limited amount of training data you buy the rights for) or enough data enters the public domain.
The Luddites weren't wrong. They really did suffer as a result of technological disruption. As with all things, the solution is a basic income funded by a land-value-tax.
petseminary t1_iv26lvl wrote
I agree with you here. I think a reasonable example is the Wayback Machine. Very useful for archiving web content that has disappeared for whatever reason (usually lapse of web hosting). But if site/content creators want their content excluded, the Wayback Machine operators are very responsive and will stop hosting this content. I anticipate that asking for your content to be excluded from training sets after the fact will be much less pleasantly received, as the model would have to be relearned and this is expensive.
Living-Substance-668 t1_iv22uy0 wrote
That may be, but either way there has been a dramatic transformation of the original works. Copyright is not an infinitely extended ownership right over information. It is a special exception (to free speech and press) we offer conditionally encourage people to produce things by allowing them to exclusively profit from their production. Like patents. Copyright does not prohibit producing a "similar" work to a copyrighted work, or using similar techniques as a copyrighted work, or else every drawing of a soup can would owe royalties to Andy Warhol
jarkkowork t1_iuzyk96 wrote
Probably similar chances of that happening as with humans whose creativity is much based on subconsciously mimicing works they have already seen
hybridteory t1_iv0f5y5 wrote
Yes, I find it incredibly strange that when speaking about Codex, everyone is worried about the models regurgitating the code they have been trained on while citing GPL and other licenses; but this seems to not be that much of an issue when it comes to images (given anecdotal evidence from these discussions), even though they themselves have licenses. It just goes to show that humans perceive text and images very differently from a creative point of view.
farmingvillein t1_iv2bbw6 wrote
-
If there can be a lawsuit, there eventually certainly will be one.
-
The issues here are--for now--different. The current claim is that Codex is copy-pasting things that need licenses attached. (Whether this is true will of course be played out in court.) For image generation, no one has made the claim--yet--that these systems are emitting straight copies (at any meaningful scale) of someone else's original pictures.
hybridteory t1_iv2ebe5 wrote
Codex is not technically copy pasting; it is generating a new output that is (almost) exactly the same, or indistinguishable on the eyes of a human, to the input. Sounds like semantics, but there is no actual copying. You already have music generating algorithms that can also generate short samples that are indistinguishable to the inputs (memorisation). Dall-E 2 is not there yet, but we are close to prompting "Original Mona Lisa painting" and be given back the original Mona Lisa painting with striking similarities. There are already several generative models of images that can mostly memorise inputs used to train it (quick example found using google: https://github.com/alan-turing-institute/memorization).
farmingvillein t1_iv2vqmx wrote
> Codex is not technically copy pasting; it is generating a new output that is (almost) exactly the same, or indistinguishable on the eyes of a human, to the input.
Nah, it is literally generating duplicates. This is copying, in the eyes of the law. Whether this is an actual legal problem remains to be seen.
> Dall-E 2 is not there yet, but we are close to prompting "Original Mona Lisa painting" and be given back the original Mona Lisa painting with striking similarities.
This is confused. Dall-E 2 is "not there yet", as a general statement, because they specifically have trained it not to do this.
hybridteory t1_iv30cij wrote
There is nothing about diffusion models that stop it from memorising data. Dall-E 2 can definitely memorise.
farmingvillein t1_iv38uzt wrote
That is my point? I'm not sure how to square your (correct) statement with your prior statement:
> Dall-E 2 is not there yet, but we are close to prompting "Original Mona Lisa painting" and be given back the original Mona Lisa painting with striking similarities
Viewing a single comment thread. View all comments