原子搜索器：通过细粒度原子思维奖励增强代理深度研究能力

摘要

大型语言模型（LLMs）展现出卓越的问题解决能力，但由于其内部知识静态不变，在处理复杂任务时仍显吃力。检索增强生成（RAG）虽提升了对外部信息的获取能力，但在多跳推理与策略性搜索方面因流程僵化而受限。近期，基于代理的深度研究进展赋予LLMs自主推理、搜索及信息综合的能力。然而，当前依赖结果导向强化学习（RL）的方法面临梯度冲突与奖励稀疏等关键问题，制约了性能提升与训练效率。为此，我们首先提出“原子思维”，一种新颖的LLM思考范式，将推理分解为细粒度的功能单元。这些单元由推理奖励模型（RRMs）监督，提供原子思维奖励（ATR）以实现精细指导。在此基础上，我们提出“原子搜索者”，一个集成原子思维与ATR的代理深度研究RL框架。原子搜索者采用课程启发式奖励调度，早期侧重过程级ATR，逐步过渡至结果奖励，加速有效推理路径的收敛。在七个基准测试上的实验显示，该方法持续超越现有技术。其核心优势包括：（1）原子搜索者在测试时能灵活扩展计算资源；（2）原子思维为RRMs提供监督锚点，连接深度研究任务与RRMs；（3）原子搜索者展现出更具可解释性、类人的推理模式。

English

Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.

原子搜索器：通过细粒度原子思维奖励增强代理深度研究能力

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

摘要

Support