ChatPaper.aiChatPaper

DeepAgent:具备可扩展工具集的通用推理智能体

DeepAgent: A General Reasoning Agent with Scalable Toolsets

October 24, 2025
作者: Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Guanting Dong, Jiajie Jin, Yinuo Wang, Hao Wang, Yutao Zhu, Ji-Rong Wen, Yuan Lu, Zhicheng Dou
cs.AI

摘要

大型推理模型已展现出强大的问题解决能力,然而现实任务往往需要外部工具和长程交互。现有智能体框架通常遵循预设流程,这限制了任务的自主性与全局完成度。本文提出DeepAgent——一种端到端深度推理智能体,可在单一连贯的推理过程中实现自主思考、工具发现与动作执行。针对长程交互中多工具调用引发的上下文长度爆炸及交互历史累积问题,我们引入自主记忆折叠机制,将过往交互压缩为结构化的情景记忆、工作记忆与工具记忆,在保留关键信息的同时减少误差累积。为高效稳定地训练通用工具使用能力,我们开发了端到端强化学习策略ToolPO,通过LLM模拟的API环境并应用工具调用优势归因方法,对工具调用令牌进行细粒度奖励分配。在八大基准测试(包括通用工具使用任务ToolBench、API-Bank、TMDB、Spotify、ToolHop以及下游应用ALFWorld、WebShop、GAIA、HLE)上的实验表明,DeepAgent在标注工具和开放集工具检索场景中均持续优于基线方法。该研究为构建适用于现实场景的通用智能体迈出重要一步。代码与演示见https://github.com/RUC-NLPIR/DeepAgent。
English
Large reasoning models have demonstrated strong problem-solving abilities, yet real-world tasks often require external tools and long-horizon interactions. Existing agent frameworks typically follow predefined workflows, which limit autonomous and global task completion. In this paper, we introduce DeepAgent, an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution within a single, coherent reasoning process. To address the challenges of long-horizon interactions, particularly the context length explosion from multiple tool calls and the accumulation of interaction history, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories, reducing error accumulation while preserving critical information. To teach general-purpose tool use efficiently and stably, we develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens. Extensive experiments on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE), demonstrate that DeepAgent consistently outperforms baselines across both labeled-tool and open-set tool retrieval scenarios. This work takes a step toward more general and capable agents for real-world applications. The code and demo are available at https://github.com/RUC-NLPIR/DeepAgent.
PDF996December 17, 2025