ChatPaper.aiChatPaper

解鎖隱性經驗:從文本合成工具使用軌跡

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

January 15, 2026
作者: Zhihao Xu, Rumei Li, Jiahuan Li, Rongxiang Weng, Jingang Wang, Xunliang Cai, Xiting Wang
cs.AI

摘要

讓大型語言模型(LLMs)在多輪互動中有效運用工具,是建構強大自主代理器的關鍵。然而,獲取多樣化且真實的多輪工具使用數據仍是重大挑戰。本研究提出一種新穎的文本驅動範式:我們觀察到文本語料庫天然蘊含豐富的多步驟問題解決經驗,可作為多輪工具使用任務中尚未開發、可擴展且真實的數據源。基於此洞見,我們提出GEM數據合成流程,通過四階段處理(相關性過濾、工作流與工具提取、軌跡錨定、複雜度優化)從文本語料生成並提取多輪工具使用軌跡。為降低計算成本,我們進一步通過監督式微調訓練專用的軌跡合成器,將複雜生成流程精簡為高效的端到端軌跡生成模型。實驗表明,我們的GEM-32B模型在BFCL V3多輪基準測試中實現16.5%的性能提升,其部分表現甚至超越使用τ-bench(航空與零售領域)內部數據訓練的模型,彰顯文本驅動合成範式所賦予的卓越泛化能力。值得注意的是,我們的軌跡合成器在保持與完整流程同等質量的同時,顯著降低了推理延遲與成本。
English
Enabling Large Language Models (LLMs) to effectively utilize tools in multi-turn interactions is essential for building capable autonomous agents. However, acquiring diverse and realistic multi-turn tool-use data remains a significant challenge. In this work, we propose a novel text-based paradigm. We observe that textual corpora naturally contain rich, multi-step problem-solving experiences, which can serve as an untapped, scalable, and authentic data source for multi-turn tool-use tasks. Based on this insight, we introduce GEM, a data synthesis pipeline that enables the generation and extraction of multi-turn tool-use trajectories from text corpora through a four-stage process: relevance filtering, workflow & tool extraction, trajectory grounding, and complexity refinement. To reduce the computational cost, we further train a specialized Trajectory Synthesizer via supervised fine-tuning. This model distills the complex generation pipeline into an efficient, end-to-end trajectory generator. Experiments demonstrate that our GEM-32B achieve a 16.5% improvement on the BFCL V3 Multi-turn benchmark. Our models partially surpass the performance of models trained on τ - bench (Airline and Retail) in-domain data, highlighting the superior generalization capability derived from our text-based synthesis paradigm. Notably, our Trajectory Synthesizer matches the quality of the full pipeline while significantly reducing inference latency and costs.
PDF302January 20, 2026