웹리서처: 장기적 사고 에이전트에서 무한한 추론 능력의 해방

초록

최근 딥 리서치 시스템의 발전은 AI 에이전트가 외부 소스로부터 지식을 자율적으로 발견하고 종합할 수 있는 잠재력을 보여주었다. 본 논문에서는 이러한 에이전트를 구축하기 위한 새로운 프레임워크인 WebResearcher를 소개한다. 이 프레임워크는 두 가지 주요 구성 요소로 이루어져 있다: (1) WebResearcher는 딥 리서치를 마르코프 결정 과정으로 재구성하는 반복적 딥 리서치 패러다임으로, 에이전트가 주기적으로 발견 사항을 진화하는 보고서로 통합하면서도 집중된 작업 공간을 유지함으로써 기존의 단일 맥락 접근법에서 발생하는 맥락 과부하 및 노이즈 오염 문제를 극복한다; (2) WebFrontier는 도구 기반 복잡성 확장을 통해 고품질의 훈련 데이터를 생성하는 확장 가능한 데이터 종합 엔진으로, 수동적 지식 회상과 능동적 지식 구축 간의 격차를 해소하는 연구 과제를 체계적으로 생성한다. 특히, 본 패러다임에서 생성된 훈련 데이터는 전통적인 단일 맥락 방법의 도구 사용 능력도 크게 향상시킨다는 점을 발견하였다. 또한, 본 패러다임은 병렬 사고를 통해 자연스럽게 확장 가능하며, 동시 다중 에이전트 탐색을 통해 보다 포괄적인 결론을 도출할 수 있다. 6개의 도전적인 벤치마크에서 수행한 광범위한 실험을 통해 WebResearcher가 최첨단 성능을 달성하며, 심지어 선도적인 독점 시스템을 능가함을 입증하였다.

English

Recent advances in deep-research systems have demonstrated the potential for AI agents to autonomously discover and synthesize knowledge from external sources. In this paper, we introduce WebResearcher, a novel framework for building such agents through two key components: (1) WebResearcher, an iterative deep-research paradigm that reformulates deep research as a Markov Decision Process, where agents periodically consolidate findings into evolving reports while maintaining focused workspaces, overcoming the context suffocation and noise contamination that plague existing mono-contextual approaches; and (2) WebFrontier, a scalable data synthesis engine that generates high-quality training data through tool-augmented complexity escalation, enabling systematic creation of research tasks that bridge the gap between passive knowledge recall and active knowledge construction. Notably, we find that the training data from our paradigm significantly enhances tool-use capabilities even for traditional mono-contextual methods. Furthermore, our paradigm naturally scales through parallel thinking, enabling concurrent multi-agent exploration for more comprehensive conclusions. Extensive experiments across 6 challenging benchmarks demonstrate that WebResearcher achieves state-of-the-art performance, even surpassing frontier proprietary systems.