ChatPaper.aiChatPaper

Lumina-Image 2.0:统一高效的图像生成框架

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

March 27, 2025
作者: Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao
cs.AI

摘要

我们推出Lumina-Image 2.0,这是一个先进的文本到图像生成框架,相较于前作Lumina-Next取得了显著进展。Lumina-Image 2.0基于两大核心原则构建:(1) 统一性——采用统一架构(Unified Next-DiT),将文本与图像标记视为联合序列处理,促进了跨模态的自然交互,并支持任务的无缝扩展。此外,鉴于高质量描述器能提供语义高度对齐的文本-图像训练对,我们引入了专为文本到图像生成任务设计的统一描述系统——Unified Captioner (UniCap)。UniCap擅长生成全面且准确的描述,加速模型收敛并增强对提示的遵循度。(2) 效率——为提升模型效率,我们开发了多阶段渐进式训练策略,并引入了不影响图像质量的推理加速技术。在学术基准和公开文本到图像平台上的广泛评估表明,Lumina-Image 2.0即使仅拥有26亿参数,也能展现出强劲性能,凸显了其可扩展性和设计效率。我们已在https://github.com/Alpha-VLLM/Lumina-Image-2.0公开了训练细节、代码及模型。
English
We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification - it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task expansion. Besides, since high-quality captioners can provide semantically well-aligned text-image training pairs, we introduce a unified captioning system, Unified Captioner (UniCap), specifically designed for T2I generation tasks. UniCap excels at generating comprehensive and accurate captions, accelerating convergence and enhancing prompt adherence. (2) Efficiency - to improve the efficiency of our proposed model, we develop multi-stage progressive training strategies and introduce inference acceleration techniques without compromising image quality. Extensive evaluations on academic benchmarks and public text-to-image arenas show that Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters, highlighting its scalability and design efficiency. We have released our training details, code, and models at https://github.com/Alpha-VLLM/Lumina-Image-2.0.

Summary

AI-Generated Summary

PDF212March 28, 2025