ChatPaper.aiChatPaper

Ming-Lite-Uni:自然多模态交互统一架构的进展

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

May 5, 2025
作者: Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang
cs.AI

摘要

我們推出Ming-Lite-Uni,這是一個開源的多模態框架,其特色在於全新設計的統一視覺生成器以及專為融合視覺與語言而生的原生多模態自回歸模型。具體而言,該項目提供了集成MetaQueries與M2-omni框架的開源實現,並引入了創新的多尺度可學習令牌及多尺度表示對齊策略。通過利用固定的MLLM(多語言學習模型)與可學習的擴散模型,Ming-Lite-Uni使得原生多模態AR模型不僅能執行文本到圖像的生成任務,還能基於指令進行圖像編輯,從而將其能力擴展至純視覺理解之外。我們的實驗結果展示了Ming-Lite-Uni的強大性能,並揭示了其交互過程令人印象深刻的流暢性。所有代碼及模型權重均已開源,以促進社區內的進一步探索。值得注意的是,此項工作與同期多模態AI里程碑——如2025年3月25日更新、具備原生圖像生成能力的ChatGPT-4o——相呼應,凸顯了像Ming-Lite-Uni這樣的統一模型在邁向通用人工智能(AGI)道路上的廣泛意義。Ming-Lite-Uni目前處於Alpha階段,並將很快得到進一步完善。
English
We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale representation alignment strategy. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results demonstrate the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. All code and model weights are open-sourced to foster further exploration within the community. Notably, this work aligns with concurrent multimodal AI milestones - such as ChatGPT-4o with native image generation updated in March 25, 2025 - underscoring the broader significance of unified models like Ming-Lite-Uni on the path toward AGI. Ming-Lite-Uni is in alpha stage and will soon be further refined.

Summary

AI-Generated Summary

PDF91May 6, 2025