MiMo-VL技术报告
MiMo-VL Technical Report
June 4, 2025
作者: Xiaomi LLM-Core Team, Zihao Yue, Zhenru Lin, Yifan Song, Weikun Wang, Shuhuai Ren, Shuhao Gu, Shicheng Li, Peidian Li, Liang Zhao, Lei Li, Kainan Bao, Hao Tian, Hailin Zhang, Gang Wang, Dawei Zhu, Cici, Chenhong He, Bowen Ye, Bowen Shen, Zihan Zhang, Zihan Jiang, Zhixian Zheng, Zhichao Song, Zhenbo Luo, Yue Yu, Yudong Wang, Yuanyuan Tian, Yu Tu, Yihan Yan, Yi Huang, Xu Wang, Xinzhe Xu, Xingchen Song, Xing Zhang, Xing Yong, Xin Zhang, Xiangwei Deng, Wenyu Yang, Wenhan Ma, Weiwei Lv, Weiji Zhuang, Wei Liu, Sirui Deng, Shuo Liu, Shimao Chen, Shihua Yu, Shaohui Liu, Shande Wang, Rui Ma, Qiantong Wang, Peng Wang, Nuo Chen, Menghang Zhu, Kangyang Zhou, Kang Zhou, Kai Fang, Jun Shi, Jinhao Dong, Jiebao Xiao, Jiaming Xu, Huaqiu Liu, Hongshen Xu, Heng Qu, Haochen Zhao, Hanglong Lv, Guoan Wang, Duo Zhang, Dong Zhang, Di Zhang, Chong Ma, Chang Liu, Can Cai, Bingquan Xia
cs.AI
摘要
我们开源了MiMo-VL-7B-SFT和MiMo-VL-7B-RL两款强大的视觉-语言模型,它们在通用视觉理解和多模态推理任务中均展现出顶尖性能。MiMo-VL-7B-RL在40项评估任务中的35项上超越了Qwen2.5-VL-7B,并在OlympiadBench上取得了59.4的高分,超越了参数规模高达78B的模型。在GUI基础应用领域,它以56.1的分数在OSWorld-G上树立了新标杆,甚至超越了如UI-TARS等专用模型。我们的训练方法结合了四阶段预训练(2.4万亿tokens)与混合在线强化学习(MORL),整合了多样化的奖励信号。我们认识到在预训练阶段融入高质量推理数据及长链思维的重要性,以及混合RL在同步多领域优化挑战中的优势。此外,我们还贡献了一套覆盖50+任务的全面评估套件,以促进可复现性并推动领域发展。模型检查点及完整评估套件可在https://github.com/XiaomiMiMo/MiMo-VL获取。
English
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language
models delivering state-of-the-art performance in both general visual
understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B
on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing
models with up to 78B parameters. For GUI grounding applications, it sets a new
standard with 56.1 on OSWorld-G, even outperforming specialized models such as
UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens)
with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward
signals. We identify the importance of incorporating high-quality reasoning
data with long Chain-of-Thought into pre-training stages, and the benefits of
mixed RL despite challenges in simultaneous multi-domain optimization. We also
contribute a comprehensive evaluation suite covering 50+ tasks to promote
reproducibility and advance the field. The model checkpoints and full
evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-VL.Summary
AI-Generated Summary