ChatPaper.aiChatPaper

EQ-VAE:通过等变性正则化潜在空间提升生成图像建模

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

February 13, 2025
作者: Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis
cs.AI

摘要

潜在生成模型已成为高质量图像合成的主流方法。这些模型依赖于自编码器将图像压缩至潜在空间,随后通过生成模型学习潜在分布。我们发现现有自编码器缺乏对缩放、旋转等语义保持变换的等变性,导致潜在空间复杂,从而影响生成性能。为此,我们提出EQ-VAE,一种简单的正则化方法,通过在潜在空间强制等变性来降低其复杂度,同时不牺牲重建质量。通过使用EQ-VAE微调预训练的自编码器,我们提升了包括DiT、SiT、REPA和MaskGIT在内的多种最先进生成模型的性能,仅需五个epoch的SD-VAE微调,DiT-XL/2的速度提升了7倍。EQ-VAE兼容连续和离散自编码器,因此为广泛的潜在生成模型提供了通用性增强。项目页面与代码:https://eq-vae.github.io/。
English
Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.

Summary

AI-Generated Summary

PDF72February 18, 2025