Atom-Searcher: 세밀한 원자적 사고 보상을 통한 에이전트 기반 심층 연구 강화

초록

대규모 언어 모델(LLM)은 놀라운 문제 해결 능력을 보여주지만, 정적인 내부 지식으로 인해 복잡한 작업에 어려움을 겪습니다. 검색 강화 생성(RAG)은 외부 정보에 대한 접근성을 향상시키지만, 경직된 워크플로우로 인해 다중 홉 추론과 전략적 검색에서 한계를 보입니다. 최근 에이전트 기반 심층 연구의 발전으로 LLM이 자율적으로 추론, 검색 및 정보를 종합할 수 있게 되었습니다. 그러나 결과 기반 강화 학습(RL)에 의존하는 현재의 접근 방식은 상충되는 그래디언트와 희소한 보상과 같은 중요한 문제를 안고 있어 성능 향상과 훈련 효율성을 제한합니다. 이를 해결하기 위해, 우리는 먼저 추론을 세분화된 기능 단위로 분해하는 새로운 LLM 사고 패러다임인 Atomic Thought를 제안합니다. 이러한 단위는 추론 보상 모델(RRM)에 의해 감독되며, 세분화된 지침을 위한 Atomic Thought 보상(ATR)을 제공합니다. 이를 기반으로, 우리는 Atomic Thought와 ATR을 통합한 에이전트 기반 심층 연구를 위한 새로운 RL 프레임워크인 Atom-Searcher를 제안합니다. Atom-Searcher는 커리큘럼에서 영감을 받은 보상 스케줄을 사용하여 초기에는 프로세스 수준의 ATR을 우선시하고, 이후 결과 보상으로 전환함으로써 효과적인 추론 경로에 대한 수렴을 가속화합니다. 7개의 벤치마크에서의 실험은 최신 기술 대비 일관된 개선을 보여줍니다. 주요 장점은 다음과 같습니다: (1) Atom-Searcher는 테스트 시 계산을 확장합니다. (2) Atomic Thought는 RRM을 위한 감독 앵커를 제공하여 심층 연구 작업과 RRM을 연결합니다. (3) Atom-Searcher는 더 해석 가능하고 인간과 유사한 추론 패턴을 보입니다.

English

Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.

Atom-Searcher: 세밀한 원자적 사고 보상을 통한 에이전트 기반 심층 연구 강화

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

초록

Support