DeepAgent:具備可擴展工具集的通用推理代理
DeepAgent: A General Reasoning Agent with Scalable Toolsets
October 24, 2025
作者: Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Guanting Dong, Jiajie Jin, Yinuo Wang, Hao Wang, Yutao Zhu, Ji-Rong Wen, Yuan Lu, Zhicheng Dou
cs.AI
摘要
大型推理模型已展現出強大的問題解決能力,然而現實世界的任務往往需要外部工具與長時程互動。現有的智能體框架通常遵循預定義流程,這限制了自主性與全域任務的完成。本文提出DeepAgent——一種端到端的深度推理智能體,能在單一連貫的推理過程中實現自主思考、工具發現與動作執行。為應對長時程互動的挑戰(特別是多次工具調用導致的上下文長度爆炸及互動歷史積累問題),我們引入自主記憶折疊機制,將過往互動壓縮為結構化的情節記憶、工作記憶與工具記憶,在保留關鍵信息的同時減少錯誤積累。為高效穩定地訓練通用工具使用能力,我們開發了端到端強化學習策略ToolPO,該策略利用LLM模擬的API介面,並通過工具調用優勢歸因方法對工具調用標記進行細粒度信用分配。在八個基準測試(含通用工具使用任務ToolBench、API-Bank、TMDB、Spotify、ToolHop及下游應用ALFWorld、WebShop、GAIA、HLE)上的廣泛實驗表明,DeepAgent在標註工具與開放集工具檢索場景中均持續優於基線模型。本工作為構建適用於現實世界的通用智能體邁出重要一步。程式碼與演示見https://github.com/RUC-NLPIR/DeepAgent。
English
Large reasoning models have demonstrated strong problem-solving abilities,
yet real-world tasks often require external tools and long-horizon
interactions. Existing agent frameworks typically follow predefined workflows,
which limit autonomous and global task completion. In this paper, we introduce
DeepAgent, an end-to-end deep reasoning agent that performs autonomous
thinking, tool discovery, and action execution within a single, coherent
reasoning process. To address the challenges of long-horizon interactions,
particularly the context length explosion from multiple tool calls and the
accumulation of interaction history, we introduce an autonomous memory folding
mechanism that compresses past interactions into structured episodic, working,
and tool memories, reducing error accumulation while preserving critical
information. To teach general-purpose tool use efficiently and stably, we
develop an end-to-end reinforcement learning strategy, namely ToolPO, that
leverages LLM-simulated APIs and applies tool-call advantage attribution to
assign fine-grained credit to the tool invocation tokens. Extensive experiments
on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank,
TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA,
HLE), demonstrate that DeepAgent consistently outperforms baselines across both
labeled-tool and open-set tool retrieval scenarios. This work takes a step
toward more general and capable agents for real-world applications. The code
and demo are available at https://github.com/RUC-NLPIR/DeepAgent.