EQ-VAE: 生成画像モデリングの改善のための等変性正則化潜在空間

要旨

潜在生成モデルは、高品質な画像合成における主要なアプローチとして台頭してきた。これらのモデルは、オートエンコーダを使用して画像を潜在空間に圧縮し、その後、生成モデルによって潜在分布を学習する。既存のオートエンコーダは、スケーリングや回転などの意味を保持する変換に対して等変性を欠いており、その結果、複雑な潜在空間が生成され、生成性能を妨げていることがわかった。この問題に対処するため、我々はEQ-VAEを提案する。これは、潜在空間における等変性を強制するシンプルな正則化手法であり、再構成品質を損なうことなく潜在空間の複雑さを低減する。事前学習済みのオートエンコーダをEQ-VAEで微調整することにより、DiT、SiT、REPA、MaskGITなど、いくつかの最先端の生成モデルの性能を向上させ、DiT-XL/2ではSD-VAEのわずか5エポックの微調整で7倍の高速化を達成した。EQ-VAEは連続的および離散的なオートエンコーダの両方と互換性があり、幅広い潜在生成モデルに対して汎用的な強化を提供する。プロジェクトページとコード: https://eq-vae.github.io/。

English

Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.

EQ-VAE: 生成画像モデリングの改善のための等変性正則化潜在空間

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

要旨

Support