루미나-이미지 2.0: 통합적이고 효율적인 이미지 생성 프레임워크

초록

우리는 이전 작업인 Lumina-Next에 비해 상당한 진전을 이룬 고급 텍스트-이미지 생성 프레임워크인 Lumina-Image 2.0을 소개합니다. Lumina-Image 2.0은 두 가지 핵심 원칙에 기반을 두고 있습니다: (1) 통합성 - 텍스트와 이미지 토큰을 공통 시퀀스로 처리하는 통합 아키텍처(Unified Next-DiT)를 채택하여 자연스러운 크로스 모달 상호작용을 가능하게 하고 원활한 작업 확장을 허용합니다. 또한, 고품질 캡션 생성기가 의미적으로 잘 정렬된 텍스트-이미지 학습 쌍을 제공할 수 있기 때문에, T2I 생성 작업을 위해 특별히 설계된 통합 캡션 시스템인 Unified Captioner(UniCap)를 도입했습니다. UniCap는 포괄적이고 정확한 캡션을 생성하여 수렴 속도를 높이고 프롬프트 준수도를 향상시킵니다. (2) 효율성 - 제안된 모델의 효율성을 개선하기 위해, 이미지 품질을 저하시키지 않으면서 다단계 점진적 학습 전략을 개발하고 추론 가속 기술을 도입했습니다. 학술 벤치마크와 공개 텍스트-이미지 아레나에서의 광범위한 평가 결과, Lumina-Image 2.0은 단 26억 개의 파라미터로도 강력한 성능을 보여주며, 그 확장성과 설계 효율성을 입증했습니다. 우리는 학습 세부 사항, 코드, 모델을 https://github.com/Alpha-VLLM/Lumina-Image-2.0에서 공개했습니다.

English

We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification - it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task expansion. Besides, since high-quality captioners can provide semantically well-aligned text-image training pairs, we introduce a unified captioning system, Unified Captioner (UniCap), specifically designed for T2I generation tasks. UniCap excels at generating comprehensive and accurate captions, accelerating convergence and enhancing prompt adherence. (2) Efficiency - to improve the efficiency of our proposed model, we develop multi-stage progressive training strategies and introduce inference acceleration techniques without compromising image quality. Extensive evaluations on academic benchmarks and public text-to-image arenas show that Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters, highlighting its scalability and design efficiency. We have released our training details, code, and models at https://github.com/Alpha-VLLM/Lumina-Image-2.0.

루미나-이미지 2.0: 통합적이고 효율적인 이미지 생성 프레임워크

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

초록

Support