ChatPaper.aiChatPaper

MiMo-VL技術報告

MiMo-VL Technical Report

June 4, 2025
作者: Xiaomi LLM-Core Team, Zihao Yue, Zhenru Lin, Yifan Song, Weikun Wang, Shuhuai Ren, Shuhao Gu, Shicheng Li, Peidian Li, Liang Zhao, Lei Li, Kainan Bao, Hao Tian, Hailin Zhang, Gang Wang, Dawei Zhu, Cici, Chenhong He, Bowen Ye, Bowen Shen, Zihan Zhang, Zihan Jiang, Zhixian Zheng, Zhichao Song, Zhenbo Luo, Yue Yu, Yudong Wang, Yuanyuan Tian, Yu Tu, Yihan Yan, Yi Huang, Xu Wang, Xinzhe Xu, Xingchen Song, Xing Zhang, Xing Yong, Xin Zhang, Xiangwei Deng, Wenyu Yang, Wenhan Ma, Weiwei Lv, Weiji Zhuang, Wei Liu, Sirui Deng, Shuo Liu, Shimao Chen, Shihua Yu, Shaohui Liu, Shande Wang, Rui Ma, Qiantong Wang, Peng Wang, Nuo Chen, Menghang Zhu, Kangyang Zhou, Kang Zhou, Kai Fang, Jun Shi, Jinhao Dong, Jiebao Xiao, Jiaming Xu, Huaqiu Liu, Hongshen Xu, Heng Qu, Haochen Zhao, Hanglong Lv, Guoan Wang, Duo Zhang, Dong Zhang, Di Zhang, Chong Ma, Chang Liu, Can Cai, Bingquan Xia
cs.AI

摘要

我們開源了MiMo-VL-7B-SFT和MiMo-VL-7B-RL,這兩款強大的視覺語言模型在通用視覺理解和多模態推理方面均展現了頂尖性能。MiMo-VL-7B-RL在40項評估任務中的35項上超越了Qwen2.5-VL-7B,並在OlympiadBench上取得了59.4分,超越了參數高達78B的模型。在GUI基礎應用方面,它以56.1分在OSWorld-G上設立了新標準,甚至超越了如UI-TARS等專用模型。我們的訓練結合了四階段預訓練(2.4萬億標記)與混合在線強化學習(MORL),整合了多樣化的獎勵信號。我們發現,在預訓練階段融入高質量推理數據及長鏈思維的重要性,以及混合RL在同步多領域優化挑戰中的益處。此外,我們貢獻了一個涵蓋50多項任務的全面評估套件,以促進可重複性並推動該領域的發展。模型檢查點及完整評估套件可在https://github.com/XiaomiMiMo/MiMo-VL獲取。
English
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify the importance of incorporating high-quality reasoning data with long Chain-of-Thought into pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote reproducibility and advance the field. The model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-VL.
PDF652June 5, 2025