ChatPaper.aiChatPaper

原子搜索器:通过细粒度原子思维奖励增强代理深度研究能力

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

August 18, 2025
作者: Yong Deng, Guoqing Wang, Zhenzhe Ying, Xiaofeng Wu, Jinzhen Lin, Wenwen Xiong, Yuqin Dai, Shuo Yang, Zhanwei Zhang, Qiwen Wang, Yang Qin, Changhua Meng
cs.AI

摘要

大型语言模型(LLMs)展现出卓越的问题解决能力,但由于其内部知识静态不变,在处理复杂任务时仍显吃力。检索增强生成(RAG)虽提升了对外部信息的获取能力,但在多跳推理与策略性搜索方面因流程僵化而受限。近期,基于代理的深度研究进展赋予LLMs自主推理、搜索及信息综合的能力。然而,当前依赖结果导向强化学习(RL)的方法面临梯度冲突与奖励稀疏等关键问题,制约了性能提升与训练效率。为此,我们首先提出“原子思维”,一种新颖的LLM思考范式,将推理分解为细粒度的功能单元。这些单元由推理奖励模型(RRMs)监督,提供原子思维奖励(ATR)以实现精细指导。在此基础上,我们提出“原子搜索者”,一个集成原子思维与ATR的代理深度研究RL框架。原子搜索者采用课程启发式奖励调度,早期侧重过程级ATR,逐步过渡至结果奖励,加速有效推理路径的收敛。在七个基准测试上的实验显示,该方法持续超越现有技术。其核心优势包括:(1)原子搜索者在测试时能灵活扩展计算资源;(2)原子思维为RRMs提供监督锚点,连接深度研究任务与RRMs;(3)原子搜索者展现出更具可解释性、类人的推理模式。
English
Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.
PDF01August 20, 2025