등변 이미지 모델링

초록

현재의 자기회귀(autoregressive) 및 확산(diffusion) 접근법과 같은 생성 모델들은 고차원 데이터 분포 학습을 일련의 더 단순한 하위 작업으로 분해합니다. 그러나 이러한 하위 작업들을 공동으로 최적화하는 과정에서 본질적인 충돌이 발생하며, 기존의 해결책들은 효율성이나 확장성을 희생하지 않고는 이러한 충돌을 해결하지 못했습니다. 우리는 자연 시각 신호의 병진 불변성(translation invariance)을 활용하여 하위 작업 간 최적화 목표를 본질적으로 정렬하는 새로운 등변(equivariant) 이미지 모델링 프레임워크를 제안합니다. 우리의 방법은 (1) 수평 축을 따라 병진 대칭성을 강화하는 열 단위 토큰화(column-wise tokenization)와 (2) 위치 간 일관된 문맥 관계를 강제하는 윈도우드 인과적 주의(windowed causal attention)를 도입합니다. 256x256 해상도의 클래스 조건부 ImageNet 생성에서 평가한 결과, 우리의 접근법은 최신 AR 모델과 비슷한 성능을 달성하면서도 더 적은 계산 자원을 사용합니다. 체계적인 분석은 강화된 등변성이 작업 간 충돌을 줄여 제로샷 일반화를 크게 개선하고 초장기 이미지 합성을 가능하게 함을 보여줍니다. 이 연구는 생성 모델링에서 작업 정렬 분해를 위한 첫 번째 프레임워크를 확립하며, 효율적인 매개변수 공유와 충돌 없는 최적화에 대한 통찰을 제공합니다. 코드와 모델은 https://github.com/drx-code/EquivariantModeling에서 공개되어 있습니다.

English

Current generative models, such as autoregressive and diffusion approaches, decompose high-dimensional data distribution learning into a series of simpler subtasks. However, inherent conflicts arise during the joint optimization of these subtasks, and existing solutions fail to resolve such conflicts without sacrificing efficiency or scalability. We propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks by leveraging the translation invariance of natural visual signals. Our method introduces (1) column-wise tokenization which enhances translational symmetry along the horizontal axis, and (2) windowed causal attention which enforces consistent contextual relationships across positions. Evaluated on class-conditioned ImageNet generation at 256x256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Systematic analysis demonstrates that enhanced equivariance reduces inter-task conflicts, significantly improving zero-shot generalization and enabling ultra-long image synthesis. This work establishes the first framework for task-aligned decomposition in generative modeling, offering insights into efficient parameter sharing and conflict-free optimization. The code and models are publicly available at https://github.com/drx-code/EquivariantModeling.

등변 이미지 모델링

Equivariant Image Modeling

초록

Support