基于高斯变分自编码器的向量量化
Vector Quantization using Gaussian Variational Autoencoder
December 7, 2025
作者: Tongda Xu, Wendi Zheng, Jiajun He, Jose Miguel Hernandez-Lobato, Yan Wang, Ya-Qin Zhang, Jie Tang
cs.AI
摘要
向量量化变分自编码器(VQ-VAE)是一种将图像压缩为离散标记的离散自编码器,其离散化特性导致模型训练困难。本文提出了一种名为高斯量化(GQ)的简易有效技术,可将满足特定约束的高斯VAE无需训练即可转换为VQ-VAE。该方法通过生成随机高斯噪声作为码本,并寻找与后验均值最接近的噪声向量。理论分析证明,当码本对数规模超过高斯VAE的比特回传编码速率时,可确保实现较小的量化误差。实践层面,我们提出了目标散度约束(TDC)启发式方法,用于训练高斯VAE以提升GQ效果。实验表明,在UNet和ViT架构上,GQ在性能上超越了VQGAN、FSQ、LFQ及BSQ等现有VQ-VAE模型。此外,TDC技术也优于TokenBridge等传统高斯VAE离散化方法。源代码已发布于https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE。
English
Vector quantized variational autoencoder (VQ-VAE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to discretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian VAE with certain constraint into a VQ-VAE without training. GQ generates random Gaussian noise as a codebook and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE.