明简统一:自然多模态交互统一架构的新进展
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
May 5, 2025
作者: Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang
cs.AI
摘要
我们推出Ming-Lite-Uni,一个开源的多模态框架,其特色在于全新设计的统一视觉生成器及专为融合视觉与语言而生的原生多模态自回归模型。具体而言,该项目不仅开源实现了集成MetaQueries与M2-omni框架,还引入了创新的多尺度可学习令牌及多尺度表示对齐策略。通过结合固定的多模态大语言模型(MLLM)与可学习的扩散模型,Ming-Lite-Uni使原生多模态自回归模型能够执行文本到图像生成及基于指令的图像编辑任务,从而扩展了其能力,超越了单纯的视觉理解范畴。实验结果表明,Ming-Lite-Uni展现出卓越的性能,其交互过程的流畅性令人印象深刻。所有代码及模型权重均已开源,以促进社区内的进一步探索。值得注意的是,此工作与同期多模态AI里程碑——如2025年3月25日更新的具备原生图像生成能力的ChatGPT-4o——相呼应,凸显了如Ming-Lite-Uni这类统一模型在通往通用人工智能(AGI)道路上的广泛意义。Ming-Lite-Uni目前处于Alpha阶段,即将迎来进一步的优化与完善。
English
We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a
newly designed unified visual generator and a native multimodal autoregressive
model tailored for unifying vision and language. Specifically, this project
provides an open-source implementation of the integrated MetaQueries and
M2-omni framework, while introducing the novel multi-scale learnable tokens and
multi-scale representation alignment strategy. By leveraging a fixed MLLM and a
learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to
perform both text-to-image generation and instruction based image editing
tasks, expanding their capabilities beyond pure visual understanding. Our
experimental results demonstrate the strong performance of Ming-Lite-Uni and
illustrate the impressive fluid nature of its interactive process. All code and
model weights are open-sourced to foster further exploration within the
community. Notably, this work aligns with concurrent multimodal AI milestones -
such as ChatGPT-4o with native image generation updated in March 25, 2025 -
underscoring the broader significance of unified models like Ming-Lite-Uni on
the path toward AGI. Ming-Lite-Uni is in alpha stage and will soon be further
refined.Summary
AI-Generated Summary