MiMo-Embodied: Informe Técnico del Modelo Fundacional X-Embodied
MiMo-Embodied: X-Embodied Foundation Model Technical Report
November 20, 2025
Autores: Xiaoshuai Hao, Lei Zhou, Zhijian Huang, Zhiwen Hou, Yingbo Tang, Lingfeng Zhang, Guang Li, Zheng Lu, Shuhuai Ren, Xianhui Meng, Yuchen Zhang, Jing Wu, Jinghui Lu, Chenxu Dang, Jiayi Guan, Jianhua Wu, Zhiyi Hou, Hanbing Li, Shumeng Xia, Mingliang Zhou, Yinan Zheng, Zihao Yue, Shuhao Gu, Hao Tian, Yuannan Shen, Jianwei Cui, Wen Zhang, Shaoqing Xu, Bing Wang, Haiyang Sun, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Chaofan Zhang, Wenbo Ding, Kun Ma, Guang Chen, Rui Cai, Diyun Xiang, Heng Qu, Fuli Luo, Hangjun Ye, Long Chen
cs.AI
Resumen
Hemos liberado como código abierto MiMo-Embodied, el primer modelo fundacional cross-embodied que integra y logra un rendimiento de vanguardia tanto en Conducción Autónoma como en IA Embebida. MiMo-Embodied establece nuevos récords en 17 benchmarks de IA embebida en Planificación de Tareas, Predicción de Posibilidades de Acción (Affordance) y Comprensión Espacial, además de sobresalir en 12 benchmarks de conducción autónoma en Percepción Ambiental, Predicción de Estados y Planificación de la Conducción. En todas estas tareas, MiMo-Embodied supera significativamente a las líneas base existentes, ya sean de código abierto, cerrado o especializadas. Nuestros resultados indican que, mediante aprendizaje multi-etapa, construcción de datos curada y ajuste fino con CoT/RL, estos dos dominios exhiben una fuerte transferencia positiva y se refuerzan mutuamente. Proporcionamos un análisis detallado de nuestro diseño de modelo y metodologías de entrenamiento para facilitar futuras investigaciones. El código y los modelos están disponibles en https://github.com/XiaomiMiMo/MiMo-Embodied.
English
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.