
grid_world OP t1_isx22tm wrote

I have been running some experiments on toy datasets (MNIST, CIFAR-10) and for now it seems that very few of the latent variables z measured with mu and logvar vectors are almost never 0. And mathematically it makes sense since all of the latent variables will learn at least some information which is not garbage (standard Gaussian). So deciding the optimal number of latent space dimensionality is still eluding


grid_world OP t1_ischbjf wrote

I don’t think that the Gaussians are being output by a layer. In contrast with an Autoencoder, where a sample is encoded to a single point, in a VAE, due to the Gaussian prior, a sample is now encoded as a Gaussian distribution. This is the regularisation effect which enforces this distribution in the latent space. It cuts both ways, meaning that if the true manifold is not Gaussian, we still assume and therefore force it to be Gaussian.

A Gaussian signal being meaningful is something that I wouldn’t count on. Diffusion models are a stark contrast, but we aren’t talking about them. The farther a signal is away from a standard Gaussian, the more information it’s trying to smuggle through the bottleneck.

I didn’t get your point of looking at the decoder weights to figure out whether they are contributing? Do you compare them to their randomly initiated values to infer this?


grid_world t1_isc9zs1 wrote

Variational Autoencoder automatic latent dimensionality selection

For a given dataset (say, CIFAR-10), if you intentionally keep the latent space dimensionality to be large, 1000-d, I am assuming that during learning, the model will automatically not use the dimensions it doesn't need to optimize the reconstruction and KL-divergence losses. Consequently, these variables will be either or very close to a multivariable, standard, Gaussian distribution(s). Is my hand wavy thought correct? And if yes, are there any research paper which prove this?