ChatPaper.aiChatPaper

Lumina-Image 2.0:統一且高效的圖像生成框架

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

March 27, 2025
作者: Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao
cs.AI

摘要

我們推出Lumina-Image 2.0,這是一個先進的文本到圖像生成框架,相比前作Lumina-Next取得了顯著進展。Lumina-Image 2.0基於兩大核心原則構建:(1) 統一性——它採用了一種統一架構(Unified Next-DiT),將文本與圖像標記視為聯合序列處理,促進了自然的跨模態交互,並支持任務的無縫擴展。此外,鑑於高質量的描述生成器能提供語義高度對齊的文本-圖像訓練對,我們引入了一個專為T2I生成任務設計的統一描述系統——Unified Captioner(UniCap)。UniCap擅長生成全面且準確的描述,加速了模型收斂並增強了對提示的遵循度。(2) 效率——為了提升所提出模型的效率,我們開發了多階段漸進式訓練策略,並引入了不損害圖像質量的推理加速技術。在學術基準測試和公開的文本到圖像競技場上的廣泛評估表明,Lumina-Image 2.0即使僅擁有26億參數,也能展現出強大的性能,凸顯了其可擴展性和設計效率。我們已在https://github.com/Alpha-VLLM/Lumina-Image-2.0上公開了訓練細節、代碼及模型。
English
We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification - it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task expansion. Besides, since high-quality captioners can provide semantically well-aligned text-image training pairs, we introduce a unified captioning system, Unified Captioner (UniCap), specifically designed for T2I generation tasks. UniCap excels at generating comprehensive and accurate captions, accelerating convergence and enhancing prompt adherence. (2) Efficiency - to improve the efficiency of our proposed model, we develop multi-stage progressive training strategies and introduce inference acceleration techniques without compromising image quality. Extensive evaluations on academic benchmarks and public text-to-image arenas show that Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters, highlighting its scalability and design efficiency. We have released our training details, code, and models at https://github.com/Alpha-VLLM/Lumina-Image-2.0.

Summary

AI-Generated Summary

PDF212March 28, 2025