MiroThinker:透過模型擴展、上下文擴展與互動擴展突破開源研究代理的效能邊界
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
November 14, 2025
作者: MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Ryan Luo, Tiantong Li, Xiang Lin, Ziyuan Liu, Zhiqi Li, Jie Ni, Qiang Ren, Pax Sun, Shiqian Su, Chenxin Tao, Bin Wang, Hellen Wang, Haonan Wang, James Wang, Jin Wang, Jojo Wang, Letian Wang, Shizun Wang, Weizhi Wang, Zixuan Wang, Jinfan Xu, Sen Xing, Chenyu Yang, Hai Ye, Jiaheng Yu, Yue Yu, Muyan Zhong, Tianchen Zhao, Xizhou Zhu, Yanpeng Zhou, Yifan Zhang, Zhi Zhu
cs.AI
摘要
我們推出 MiroThinker v1.0,這是一款專注於提升工具增強推理與資訊獲取能力的開源研究智能體。與以往僅擴展模型規模或上下文長度的智能體不同,MiroThinker 探索模型層級的互動擴展維度,通過系統化訓練使模型能處理更深層、更頻繁的智能體-環境互動,以此作為性能提升的第三個關鍵維度。有別於孤立運行且易因推理鏈過長而性能衰退的 LLM 測試時擴展方案,互動擴展利用環境回饋與外部資訊獲取來修正錯誤並優化執行軌跡。透過強化學習,該模型實現了高效的互動擴展:在 256K 上下文窗口下,單一任務可執行多達 600 次工具調用,支撐持續多輪推理與複雜現實研究流程。在 GAIA、HLE、BrowseComp 及 BrowseComp-ZH 四個代表性基準測試中,72B 參數版本分別達到 81.9%、37.7%、47.1% 與 55.6% 的準確率,不僅超越既往開源智能體,更逼近 GPT-5-high 等商業級模型。我們的分析表明,MiroThinker 始終受益於互動擴展:隨著模型進行更深層、更密集的智能體-環境互動,其研究性能呈現可預測的提升,證明互動深度與模型規模、上下文長度類似,同樣具備可擴展特性。這些發現確立了互動擴展作為構建下一代開源研究智能體的第三大關鍵維度,與模型容量及上下文窗口形成有效互補。
English
We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.