Atom-Searcher: 細粒度なアトミック思考報酬によるエージェント的深層研究の強化

要旨

大規模言語モデル（LLM）は、驚くべき問題解決能力を示す一方で、静的な内部知識のため複雑なタスクに苦戦しています。検索拡張生成（RAG）は外部情報へのアクセスを強化しますが、固定されたワークフローのため、多段階推論や戦略的検索において制限があります。最近のエージェント型深層研究の進展により、LLMは自律的に推論、検索、情報統合を行うことが可能になりました。しかし、結果ベースの強化学習（RL）に依存する現在のアプローチでは、勾配の衝突や報酬の希薄性といった重大な問題が生じ、性能向上や学習効率が制限されています。これらの課題に対処するため、我々はまず「Atomic Thought」を提案します。これは、推論を細かい機能単位に分解する新しいLLM思考パラダイムです。これらの単位は「Reasoning Reward Models（RRM）」によって監督され、細かいガイダンスのための「Atomic Thought Rewards（ATR）」を提供します。これを基盤として、Atomic ThoughtとATRを統合した新しいRLフレームワーク「Atom-Searcher」を提案します。Atom-Searcherは、カリキュラムに着想を得た報酬スケジュールを使用し、初期段階ではプロセスレベルのATRを優先し、その後結果報酬に移行することで、効果的な推論パスへの収束を加速します。7つのベンチマークでの実験では、最先端技術を一貫して上回る改善が示されました。主な利点は以下の通りです：（1）Atom-Searcherはテスト時に計算をスケールします。（2）Atomic ThoughtはRRMのための監督アンカーを提供し、深層研究タスクとRRMを橋渡しします。（3）Atom-Searcherはより解釈可能で人間らしい推論パターンを示します。

English

Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.

Atom-Searcher: 細粒度なアトミック思考報酬によるエージェント的深層研究の強化

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

要旨

Support