Ronny_Jotten t1_j6i3uog wrote on January 30, 2023 at 2:25 PM

Reply to comment by CallFromMargin in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee

I don't know what paper you're referring to, but there's this one:

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

It clearly shows, at the top of the first page, the full Stable Diffusion model, trained on billions of LAION images, replicating images that are clearly "substantially similar" copyright violations of its training data. The paper cites several other papers regarding the ability of large models to memorize their inputs.

It may be possible to tweak the generation algorithm to no longer output such similar images, but it's clear that they are still present in the trained model network.

Mr_ToDo t1_j6j481z wrote on January 30, 2023 at 6:23 PM

Well, they did both in that paper. But it would be interesting to know what the ones at the top were from. I know that there's one I saw further down in high hit percents further down but with as nice as they are I don't know why the rest don't if they belong to that model.

Ronny_Jotten t1_j6kjrlv wrote on January 30, 2023 at 11:50 PM

The paper explains what the ones at the top were from. It's using Stable Diffusion 1.4. See page 7: Case Study: Stable Diffusion, page 14: C. Stable Diffusion settings, and page 15 for the prompts and match captions. Sorry, the rest of your comment is incomprehensible to me...

Mr_ToDo t1_j6mwtay wrote on January 31, 2023 at 1:50 PM

OK that's on me. I hit the references and somehow thought I was done with the paper, I didn't think they would have the captions they used underneath that. I admit that was on my bad due diligence. Apologies