FunReason-MT技术报告:突破多轮函数调用的复杂性壁垒
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling
October 28, 2025
作者: Zengzhuang Xu, Bingguang Hao, Zechuan Wang, Yuntao Wen, Maolin Wang, Yang Liu, Long Chen, Dong Wang, Yicheng Chen, Cunyin Peng, Chenyi Zhuang, Jinjie Gu, Leilei Gan, Xiangyu Zhao, Shi Gu
cs.AI
摘要
函数调用(FC)能力使大语言模型(LLMs)和智能体能够与外部工具交互,这是解决复杂现实问题的关键能力。随着该能力在先进AI系统中的重要性日益凸显,对高质量多轮对话训练数据的需求变得尤为迫切。现有数据合成方法(如随机环境采样或多智能体角色扮演)在真实场景中难以生成高质量数据。实际挑战主要体现在三个方面:定向模型训练、工具架构隔离以及多轮逻辑依赖性。为应对这些结构性缺陷,我们提出FunReason-MT——一种面向真实世界多轮工具使用的新型数据合成框架。该框架通过以下方式突破多轮FC数据的复杂性壁垒:1)采用环境-API图交互收集多样化高质量轨迹;2)通过高级工具查询合成简化复杂查询构建;3)利用引导式迭代链实现精细思维链生成。在伯克利函数调用排行榜(BFCLv3)上的评估表明,基于FunReason-MT生成数据训练的40亿参数模型在同等规模模型中达到最优性能,超越多数闭源模型。在BFCLv4上的进一步性能提升证实,FunReason-MT为智能体学习提供了可靠且鲁棒的数据支撑。
English
Function calling (FC) empowers large language models (LLMs) and autonomous
agents to interface with external tools, a critical capability for solving
complex, real-world problems. As this ability becomes increasingly central to
advanced AI systems, the need for high-quality, multi-turn training data to
develop and refine it cannot be overstated. Existing data synthesis methods,
such as random environment sampling or multi-agent role-playing, are not
powerful enough to generate high-quality data in real-world environments.
Practical challenges come in three folds: targeted model training, isolation of
tool architecture, and multi-turn logical dependency. To address these
structural deficiencies, we present FunReason-MT, a novel data synthesis
framework for real-world multi-turn tool use. FunReason-MT resolves the
complexity barrier in multi-turn FC data by employing 1) Environment-API Graph
Interactions to gather varied high-quality trajectories, 2) Advanced Tool-Query
Synthesis to simplify hard query construction, and 3) Guided Iterative Chain
for sophisticated CoT generation. Evaluations on Berkeley Function-Calling
Leaderboard (BFCLv3) demonstrate the power of our framework: a 4B model built
upon FunReason-MT generated data achieves state-of-the-art performance among
comparable-sized models, outperforming most close-source models. Further
performance improvements on BFCLv4 confirm that FunReason-MT provides a
reliable and robust source for agentic learning.