ChatPaper.aiChatPaper

等变图像建模

Equivariant Image Modeling

March 24, 2025
作者: Ruixiao Dong, Mengde Xu, Zigang Geng, Li Li, Han Hu, Shuyang Gu
cs.AI

摘要

当前生成模型,如自回归和扩散方法,将高维数据分布学习分解为一系列较简单的子任务。然而,在联合优化这些子任务时会出现内在冲突,现有解决方案无法在不牺牲效率或可扩展性的情况下解决此类冲突。我们提出了一种新颖的等变图像建模框架,通过利用自然视觉信号的平移不变性,从根本上对齐子任务间的优化目标。我们的方法引入了(1)增强水平轴平移对称性的列式标记化,以及(2)确保跨位置上下文关系一致性的窗口化因果注意力机制。在256x256分辨率的类别条件ImageNet生成任务上评估,我们的方法在减少计算资源使用的同时,实现了与最先进自回归模型相当的性能。系统分析表明,增强的等变性减少了任务间冲突,显著提升了零样本泛化能力,并支持超长图像合成。本研究首次建立了生成模型中任务对齐分解的框架,为高效参数共享和无冲突优化提供了洞见。代码和模型已公开于https://github.com/drx-code/EquivariantModeling。
English
Current generative models, such as autoregressive and diffusion approaches, decompose high-dimensional data distribution learning into a series of simpler subtasks. However, inherent conflicts arise during the joint optimization of these subtasks, and existing solutions fail to resolve such conflicts without sacrificing efficiency or scalability. We propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks by leveraging the translation invariance of natural visual signals. Our method introduces (1) column-wise tokenization which enhances translational symmetry along the horizontal axis, and (2) windowed causal attention which enforces consistent contextual relationships across positions. Evaluated on class-conditioned ImageNet generation at 256x256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Systematic analysis demonstrates that enhanced equivariance reduces inter-task conflicts, significantly improving zero-shot generalization and enabling ultra-long image synthesis. This work establishes the first framework for task-aligned decomposition in generative modeling, offering insights into efficient parameter sharing and conflict-free optimization. The code and models are publicly available at https://github.com/drx-code/EquivariantModeling.

Summary

AI-Generated Summary

PDF151March 25, 2025