Submitted by pm_me_your_pay_slips t3_10r57pn in MachineLearning
Argamanthys t1_j6w9gal wrote
Reply to comment by HateRedditCantQuitit in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
There is a short story called The Library of Babel about a near-infinite library that contains every possible permutation of a book with 1,312,000 characters. It is not hard to recreate that library in code. You can explore it if you want.
Contained within that library is a copy of every book ever written, freely available to read.
Is that book piracy? It's right there if you know where to look.
That's pretty much what's going on here. They searched the latent space for an image and found it. But that's because the latent space, like the Library of Babel is really big and contains not just that image but also near-infinite permutations of it.
SuddenlyBANANAS t1_j6waypu wrote
If diffusion models were a perfect bijection between the latent space and the space of possible images, that would make sense, but they're obviously not. If you could repeat this procedure and find exact duplicates of images which were not in the training data, you'd have a point.
starstruckmon t1_j6xbhe1 wrote
>find exact duplicates of images which were not in the training data, you'd have a point
The process isn't exactly the same, but isn't this how all the diffusion based editing techniques work?
WikiSummarizerBot t1_j6w9h7w wrote
>"The Library of Babel" (Spanish: La biblioteca de Babel) is a short story by Argentine author and librarian Jorge Luis Borges (1899–1986), conceiving of a universe in the form of a vast library containing all possible 410-page books of a certain format and character set. The story was originally published in Spanish in Borges' 1941 collection of stories El jardín de senderos que se bifurcan (The Garden of Forking Paths). That entire book was, in turn, included within his much-reprinted Ficciones (1944).
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
maxToTheJ t1_j6x4vrz wrote
> That's pretty much what's going on here.
No its not. We wouldn’t need training sets if that was the case like in the scenario described where you can generate the dataset using a known algo
Viewing a single comment thread. View all comments