等變圖像建模
Equivariant Image Modeling
March 24, 2025
作者: Ruixiao Dong, Mengde Xu, Zigang Geng, Li Li, Han Hu, Shuyang Gu
cs.AI
摘要
當前的生成模型,如自回歸和擴散方法,將高維數據分佈的學習分解為一系列更簡單的子任務。然而,在這些子任務的聯合優化過程中會出現固有的衝突,而現有的解決方案在保持效率或可擴展性的同時無法有效解決這些衝突。我們提出了一種新穎的等變圖像建模框架,該框架通過利用自然視覺信號的平移不變性,從本質上對齊了子任務間的優化目標。我們的方法引入了(1)列式標記化,增強了沿水平軸的平移對稱性,以及(2)窗口化因果注意力,確保了跨位置的一致上下文關係。在256x256分辨率的類條件ImageNet生成任務上進行評估,我們的模型在減少計算資源使用的情況下,達到了與最先進的自回歸模型相當的性能。系統性分析表明,增強等變性減少了任務間的衝突,顯著提升了零樣本泛化能力,並實現了超長圖像合成。這項工作首次建立了生成建模中任務對齊分解的框架,為高效參數共享和無衝突優化提供了洞見。代碼和模型已公開於https://github.com/drx-code/EquivariantModeling。
English
Current generative models, such as autoregressive and diffusion approaches,
decompose high-dimensional data distribution learning into a series of simpler
subtasks. However, inherent conflicts arise during the joint optimization of
these subtasks, and existing solutions fail to resolve such conflicts without
sacrificing efficiency or scalability. We propose a novel equivariant image
modeling framework that inherently aligns optimization targets across subtasks
by leveraging the translation invariance of natural visual signals. Our method
introduces (1) column-wise tokenization which enhances translational symmetry
along the horizontal axis, and (2) windowed causal attention which enforces
consistent contextual relationships across positions. Evaluated on
class-conditioned ImageNet generation at 256x256 resolution, our approach
achieves performance comparable to state-of-the-art AR models while using fewer
computational resources. Systematic analysis demonstrates that enhanced
equivariance reduces inter-task conflicts, significantly improving zero-shot
generalization and enabling ultra-long image synthesis. This work establishes
the first framework for task-aligned decomposition in generative modeling,
offering insights into efficient parameter sharing and conflict-free
optimization. The code and models are publicly available at
https://github.com/drx-code/EquivariantModeling.Summary
AI-Generated Summary