ChatPaper.aiChatPaper

扩散模型的几何自编码器

Geometric Autoencoder for Diffusion Models

March 11, 2026
作者: Hangyu Liu, Jianyong Wang, Yutao Sun
cs.AI

摘要

潛在擴散模型已在高分辨率視覺生成領域樹立了新的技術標杆。儘管融合視覺基礎模型的先驗知識能提升生成效率,現有潛空間設計仍多基於經驗性方法。這些方法往往難以兼顧語義可區分性、重建保真度與潛空間緊湊性。本文提出幾何自編碼器(GAE),這一理論驅動的框架系統性解決上述挑戰。通過分析多種對齊範式,GAE從視覺基礎模型中構建出經優化的低維語義監督目標,為自編碼器提供指導。此外,我們採用潛在歸一化技術替代標準變分自編碼器中限制性強的KL散度,構建了專為擴散學習優化的穩定潛流形。為確保高強度噪聲下的魯棒重建,GAE引入了動態噪聲採樣機制。實驗表明,GAE在ImageNet-1K 256×256基準測試中取得顯著性能:無需分類器自由引導時,僅80輪訓練即達1.82的gFID指標,800輪後進一步降至1.31,顯著超越現有最先進方法。除生成質量外,GAE在壓縮率、語義深度與重建穩定性間建立了更優平衡。這些成果驗證了我們的設計思路,為潛在擴散建模提供了新範式。代碼與模型已開源於https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models。
English
Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These approaches often struggle to unify semantic discriminability, reconstruction fidelity, and latent compactness. In this paper, we propose Geometric Autoencoder (GAE), a principled framework that systematically addresses these challenges. By analyzing various alignment paradigms, GAE constructs an optimized low-dimensional semantic supervision target from VFMs to provide guidance for the autoencoder. Furthermore, we leverage latent normalization that replaces the restrictive KL-divergence of standard VAEs, enabling a more stable latent manifold specifically optimized for diffusion learning. To ensure robust reconstruction under high-intensity noise, GAE incorporates a dynamic noise sampling mechanism. Empirically, GAE achieves compelling performance on the ImageNet-1K 256 times 256 benchmark, reaching a gFID of 1.82 at only 80 epochs and 1.31 at 800 epochs without Classifier-Free Guidance, significantly surpassing existing state-of-the-art methods. Beyond generative quality, GAE establishes a superior equilibrium between compression, semantic depth and robust reconstruction stability. These results validate our design considerations, offering a promising paradigm for latent diffusion modeling. Code and models are publicly available at https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.
PDF42March 15, 2026