Any VQ-based model (VQ-VAE, VQ-GAN, etc.) can be interpreted as a compression model. Many generative image models use VQ but they don't often present rate-distortion results. And, as /u/speyside42 said above, they typically assume a uniform distribution over the codebook, which isn't very interesting from a compression point of view. Instead, you want to learn a distribution and use it as an entropy model in conjunction with an entropy coder. Note that SoundStream (mentioned above) uses residual VQ (RVQ).
Image Compression with Product Quantized Masked Image Modeling uses a kind of VQ (subdivide the latent vectors and code separate to form a product quantizer) along with masked image modeling (MIM) to get a conditional distribution over codewords. MIM is often used for generation but here they entropy code instead of sample.
FrogBearSalamander t1_jc5vvrb wrote
Reply to comment by currentscurrents in [D]: Generalisation ability of autoencoders by Blutorangensaft
> Would love to read some research papers if you have a link!