等变图像建模
Equivariant Image Modeling
March 24, 2025
作者: Ruixiao Dong, Mengde Xu, Zigang Geng, Li Li, Han Hu, Shuyang Gu
cs.AI
摘要
当前生成模型,如自回归和扩散方法,将高维数据分布学习分解为一系列较简单的子任务。然而,在联合优化这些子任务时会出现内在冲突,现有解决方案无法在不牺牲效率或可扩展性的情况下解决此类冲突。我们提出了一种新颖的等变图像建模框架,通过利用自然视觉信号的平移不变性,从根本上对齐子任务间的优化目标。我们的方法引入了(1)增强水平轴平移对称性的列式标记化,以及(2)确保跨位置上下文关系一致性的窗口化因果注意力机制。在256x256分辨率的类别条件ImageNet生成任务上评估,我们的方法在减少计算资源使用的同时,实现了与最先进自回归模型相当的性能。系统分析表明,增强的等变性减少了任务间冲突,显著提升了零样本泛化能力,并支持超长图像合成。本研究首次建立了生成模型中任务对齐分解的框架,为高效参数共享和无冲突优化提供了洞见。代码和模型已公开于https://github.com/drx-code/EquivariantModeling。
English
Current generative models, such as autoregressive and diffusion approaches,
decompose high-dimensional data distribution learning into a series of simpler
subtasks. However, inherent conflicts arise during the joint optimization of
these subtasks, and existing solutions fail to resolve such conflicts without
sacrificing efficiency or scalability. We propose a novel equivariant image
modeling framework that inherently aligns optimization targets across subtasks
by leveraging the translation invariance of natural visual signals. Our method
introduces (1) column-wise tokenization which enhances translational symmetry
along the horizontal axis, and (2) windowed causal attention which enforces
consistent contextual relationships across positions. Evaluated on
class-conditioned ImageNet generation at 256x256 resolution, our approach
achieves performance comparable to state-of-the-art AR models while using fewer
computational resources. Systematic analysis demonstrates that enhanced
equivariance reduces inter-task conflicts, significantly improving zero-shot
generalization and enabling ultra-long image synthesis. This work establishes
the first framework for task-aligned decomposition in generative modeling,
offering insights into efficient parameter sharing and conflict-free
optimization. The code and models are publicly available at
https://github.com/drx-code/EquivariantModeling.Summary
AI-Generated Summary