EQ-VAE:通過等變性正則化潛在空間提升生成式圖像建模
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
February 13, 2025
作者: Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis
cs.AI
摘要
潛在生成模型已成為高質量圖像合成的主要方法。這些模型依賴於自動編碼器將圖像壓縮到潛在空間,然後由生成模型學習潛在分佈。我們發現現有的自動編碼器缺乏對語義保持變換(如縮放和旋轉)的等變性,導致潛在空間複雜,從而阻礙生成性能。為解決這一問題,我們提出了EQ-VAE,這是一種簡單的正則化方法,可在不降低重建質量的情況下強制潛在空間的等變性,從而降低其複雜性。通過使用EQ-VAE微調預訓練的自動編碼器,我們提升了多種最先進生成模型的性能,包括DiT、SiT、REPA和MaskGIT,僅需五個epoch的SD-VAE微調即可在DiT-XL/2上實現7倍加速。EQ-VAE兼容連續和離散的自動編碼器,因此為廣泛的潛在生成模型提供了多功能的增強。項目頁面和代碼:https://eq-vae.github.io/。
English
Latent generative models have emerged as a leading approach for high-quality
image synthesis. These models rely on an autoencoder to compress images into a
latent space, followed by a generative model to learn the latent distribution.
We identify that existing autoencoders lack equivariance to semantic-preserving
transformations like scaling and rotation, resulting in complex latent spaces
that hinder generative performance. To address this, we propose EQ-VAE, a
simple regularization approach that enforces equivariance in the latent space,
reducing its complexity without degrading reconstruction quality. By finetuning
pre-trained autoencoders with EQ-VAE, we enhance the performance of several
state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT,
achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning.
EQ-VAE is compatible with both continuous and discrete autoencoders, thus
offering a versatile enhancement for a wide range of latent generative models.
Project page and code: https://eq-vae.github.io/.Summary
AI-Generated Summary