맘바를 사용한 확장 가능한 자기 회귀 이미지 생성

초록

우리는 Mamba 아키텍처를 기반으로 한 자기회귀(AR) 이미지 생성 모델인 AiM을 소개합니다. AiM은 Mamba를 활용하는데, Mamba는 선형 시간 복잡도로 장기 시퀀스 모델링에 뛰어난 성능을 보이는 새로운 상태-공간 모델로 특징지어집니다. AR 이미지 생성 모델에서 일반적으로 사용되는 트랜스포머를 대체하여, AiM은 우수한 생성 품질과 향상된 추론 속도를 동시에 달성하기 위해 노력합니다. 기존 방법들이 Mamba를 2차원 신호를 처리할 수 있도록 다방향 스캔을 통해 적응하는 반면, AiM은 자기회귀 이미지 생성을 위해 다음 토큰 예측 패러다임을 직접 활용합니다. 이 접근 방식은 Mamba가 2D 공간 표현을 학습할 수 있도록 광범위한 수정이 필요하지 않도록 합니다. 시각적 생성 작업을 위해 직관적이면서 전략적으로 목표를 맞춘 수정을 구현함으로써, 우리는 Mamba의 핵심 구조를 유지하고 효율적인 장기 시퀀스 모델링 능력과 확장성을 완전히 활용합니다. 우리는 148M에서 1.3B에 이르는 다양한 규모의 AiM 모델을 제공합니다. ImageNet1K 256*256 벤치마크에서, 우리의 최고 AiM 모델은 2.21의 FID를 달성하여, 유사한 매개변수 수를 갖는 모든 기존 AR 모델을 능가하며, 확산 모델에 대해 상당한 경쟁력을 보여주며 2배에서 10배 빠른 추론 속도를 보여줍니다. 코드는 https://github.com/hp-l33/AiM에서 확인할 수 있습니다.

English

We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Unlike existing methods that adapt Mamba to handle two-dimensional signals via multi-directional scan, AiM directly utilizes the next-token prediction paradigm for autoregressive image generation. This approach circumvents the need for extensive modifications to enable Mamba to learn 2D spatial representations. By implementing straightforward yet strategically targeted modifications for visual generative tasks, we preserve Mamba's core structure, fully exploiting its efficient long-sequence modeling capabilities and scalability. We provide AiM models in various scales, with parameter counts ranging from 148M to 1.3B. On the ImageNet1K 256*256 benchmark, our best AiM model achieves a FID of 2.21, surpassing all existing AR models of comparable parameter counts and demonstrating significant competitiveness against diffusion models, with 2 to 10 times faster inference speed. Code is available at https://github.com/hp-l33/AiM

맘바를 사용한 확장 가능한 자기 회귀 이미지 생성

Scalable Autoregressive Image Generation with Mamba

초록

Support