ChatPaper.aiChatPaper

Atom-Searcher:透過細粒度原子思維獎勵強化代理深度研究

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

August 18, 2025
作者: Yong Deng, Guoqing Wang, Zhenzhe Ying, Xiaofeng Wu, Jinzhen Lin, Wenwen Xiong, Yuqin Dai, Shuo Yang, Zhanwei Zhang, Qiwen Wang, Yang Qin, Changhua Meng
cs.AI

摘要

大型語言模型(LLMs)展現出卓越的問題解決能力,但在處理複雜任務時因內部知識的靜態性而面臨挑戰。檢索增強生成(RAG)雖提升了對外部資訊的獲取能力,但在多跳推理和策略性搜索方面仍受限於僵化的工作流程。近期,基於代理的深度研究進展賦予了LLMs自主推理、搜索和綜合資訊的能力。然而,當前依賴於基於結果的強化學習(RL)的方法面臨著梯度衝突和獎勵稀疏性等關鍵問題,限制了性能提升和訓練效率。為解決這些問題,我們首先提出了原子思維(Atomic Thought),這是一種新穎的LLM思維範式,將推理分解為細粒度的功能單元。這些單元由推理獎勵模型(RRMs)監督,並提供原子思維獎勵(ATR)以進行細粒度指導。基於此,我們提出了Atom-Searcher,這是一個整合了原子思維和ATR的新穎RL框架,用於代理深度研究。Atom-Searcher採用課程啟發的獎勵調度,早期優先考慮過程級別的ATR,並逐步過渡到結果獎勵,從而加速有效推理路徑的收斂。在七個基準測試上的實驗顯示,相較於現有技術,Atom-Searcher均取得了持續的改進。其主要優勢包括:(1)Atom-Searcher在測試時可擴展計算資源。(2)原子思維為RRMs提供了監督錨點,橋接了深度研究任務與RRMs。(3)Atom-Searcher展現出更具可解釋性、更接近人類的推理模式。
English
Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.
PDF01August 20, 2025