EQ-VAE:通过等变性正则化潜在空间提升生成图像建模
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
February 13, 2025
作者: Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis
cs.AI
摘要
潜在生成模型已成为高质量图像合成的主流方法。这些模型依赖于自编码器将图像压缩至潜在空间,随后通过生成模型学习潜在分布。我们发现现有自编码器缺乏对缩放、旋转等语义保持变换的等变性,导致潜在空间复杂,从而影响生成性能。为此,我们提出EQ-VAE,一种简单的正则化方法,通过在潜在空间强制等变性来降低其复杂度,同时不牺牲重建质量。通过使用EQ-VAE微调预训练的自编码器,我们提升了包括DiT、SiT、REPA和MaskGIT在内的多种最先进生成模型的性能,仅需五个epoch的SD-VAE微调,DiT-XL/2的速度提升了7倍。EQ-VAE兼容连续和离散自编码器,因此为广泛的潜在生成模型提供了通用性增强。项目页面与代码:https://eq-vae.github.io/。
English
Latent generative models have emerged as a leading approach for high-quality
image synthesis. These models rely on an autoencoder to compress images into a
latent space, followed by a generative model to learn the latent distribution.
We identify that existing autoencoders lack equivariance to semantic-preserving
transformations like scaling and rotation, resulting in complex latent spaces
that hinder generative performance. To address this, we propose EQ-VAE, a
simple regularization approach that enforces equivariance in the latent space,
reducing its complexity without degrading reconstruction quality. By finetuning
pre-trained autoencoders with EQ-VAE, we enhance the performance of several
state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT,
achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning.
EQ-VAE is compatible with both continuous and discrete autoencoders, thus
offering a versatile enhancement for a wide range of latent generative models.
Project page and code: https://eq-vae.github.io/.Summary
AI-Generated Summary