Atom-Searcher：透過細粒度原子思維獎勵強化代理深度研究

摘要

大型語言模型（LLMs）展現出卓越的問題解決能力，但在處理複雜任務時因內部知識的靜態性而面臨挑戰。檢索增強生成（RAG）雖提升了對外部資訊的獲取能力，但在多跳推理和策略性搜索方面仍受限於僵化的工作流程。近期，基於代理的深度研究進展賦予了LLMs自主推理、搜索和綜合資訊的能力。然而，當前依賴於基於結果的強化學習（RL）的方法面臨著梯度衝突和獎勵稀疏性等關鍵問題，限制了性能提升和訓練效率。為解決這些問題，我們首先提出了原子思維（Atomic Thought），這是一種新穎的LLM思維範式，將推理分解為細粒度的功能單元。這些單元由推理獎勵模型（RRMs）監督，並提供原子思維獎勵（ATR）以進行細粒度指導。基於此，我們提出了Atom-Searcher，這是一個整合了原子思維和ATR的新穎RL框架，用於代理深度研究。Atom-Searcher採用課程啟發的獎勵調度，早期優先考慮過程級別的ATR，並逐步過渡到結果獎勵，從而加速有效推理路徑的收斂。在七個基準測試上的實驗顯示，相較於現有技術，Atom-Searcher均取得了持續的改進。其主要優勢包括：（1）Atom-Searcher在測試時可擴展計算資源。（2）原子思維為RRMs提供了監督錨點，橋接了深度研究任務與RRMs。（3）Atom-Searcher展現出更具可解釋性、更接近人類的推理模式。

English

Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.

Atom-Searcher：透過細粒度原子思維獎勵強化代理深度研究

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

摘要

Support