MiroThinker:通过模型扩展、上下文增强与交互优化突破开源研究智能体的性能边界
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
November 14, 2025
作者: MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Ryan Luo, Tiantong Li, Xiang Lin, Ziyuan Liu, Zhiqi Li, Jie Ni, Qiang Ren, Pax Sun, Shiqian Su, Chenxin Tao, Bin Wang, Hellen Wang, Haonan Wang, James Wang, Jin Wang, Jojo Wang, Letian Wang, Shizun Wang, Weizhi Wang, Zixuan Wang, Jinfan Xu, Sen Xing, Chenyu Yang, Hai Ye, Jiaheng Yu, Yue Yu, Muyan Zhong, Tianchen Zhao, Xizhou Zhu, Yanpeng Zhou, Yifan Zhang, Zhi Zhu
cs.AI
摘要
我们推出MiroThinker v1.0——一款专为推进工具增强推理与信息检索能力而设计的开源研究智能体。与先前仅扩大模型规模或上下文长度的智能体不同,MiroThinker开创性地探索模型层级的交互式扩展,通过系统性训练使模型能够处理更深层次、更高频次的智能体-环境交互,将其作为性能提升的第三维度。相较于孤立运行且长推理链易出现性能衰减的大语言模型测试时扩展,交互式扩展充分利用环境反馈与外部信息获取来修正错误并优化决策轨迹。通过强化学习,该模型实现了高效的交互扩展:在256K上下文窗口支持下,单任务可执行高达600次工具调用,支撑持续多轮推理与复杂现实研究流程。在GAIA、HLE、BrowseComp和BrowseComp-ZH四个代表性基准测试中,72B参数版本分别达到81.9%、37.7%、47.1%和55.6%的准确率,超越既往开源智能体并逼近GPT-5-high等商业模型。我们的分析表明,MiroThinker始终受益于交互式扩展:随着模型参与更深入频繁的智能体-环境交互,研究性能呈现可预测的提升,证明交互深度具有与模型规模、上下文长度类似的扩展规律。这些发现确立了交互扩展作为构建下一代开源研究智能体的第三关键维度,与模型容量和上下文窗口形成互补。
English
We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.