Comments
tripple13 t1_ivqpwft wrote
This.
I guess that's whats remarkably fascinating by these models.
Albeit, in essence, you are putting a prior on the training set, thus there should be some limit to the manifold from which samples are generated.
[deleted] OP t1_ivrk9m6 wrote
[deleted]
Delacroid t1_ivslx97 wrote
Yes but you can make the latent dimension as high as you want.
bloc97 t1_ivpzu4j wrote
Theoretically, the upper bound of distinct images is proportional to the number of bits required to encode each latent, thus a 64x64x4 latent encoded as a 32-bit number would amount to (2^32)^(64x64x4) images. However, many of those combinations are not considered to be "images" (they are "out of distribution"), thus the real number might be much much smaller than this, depending on the dataset and the network size.
[deleted] OP t1_ivqfqvz wrote
[deleted]
bloc97 t1_ivqgf0q wrote
I was considering an unconditional latent diffusion model, but for conditional models, the computation becomes much more complex (we might have to use bayes here). If we use Score-Based Generative Modeling (https://arxiv.org/abs/2011.13456), we could try to find and count all the unique local minima and saddle points, but it is not clear how we can do this...
Professional-Ebb4970 t1_ivr476b wrote
You don't need to use a single seed for the noise patch, you can use random numbers and it will work just fine
dojoteef t1_ivou9rr wrote
No one can answer that question, since not all possible output images are equally probable (some are even impossible given trained network weights). You might be able to make an empirical estimate, but enumerating the true output space of any sufficiently complex NN is an open problem.
Nmanga90 t1_ivpsuae wrote
Well if we are talking just about the output of any diffusion model, being 512x512 pixels we get 262,144 pixels.
Each pixel can have a range of 0-255 for R,G, and B
256^3 = 16,777,217 (possible combinations for each pixel)
So then we get 16,777,217 ^ 262,144
This is an unimaginably large number, but it’s important to note that many of these images will appear to be exactly the same as one another due to our perception of color.
behold_s t1_ivotsdx wrote
Today I just watched the video about Stable Diffusion on 2minutes paper YT channel. I thought İ I understood the core of the subject, until I see this post 😂. What is one awesome thing you think this advancements will possibly lead up to?
dasayan05 t1_ivpmx7r wrote
There is no way to feasibly compute what you are asking for.
Diffusion Models (in fact any modern generative model) are defined on continuous image-space, i.e. a continuous vector of 512x512 length. This space is not discrete, so there isn't even any notion of "distinct images". A tiny continuous change can lead to another plausible image and there are (theoretically) infinitely many tiny change you can apply on an image to produce another image that looks same but isn't the same point in image space.
The (theoretically) correct answer to your question would be that there are infitiely many images you can sample from a given generative model.