DuMate-DeepResearch: 감사 가능한 재귀적 검색 및 루브릭 기반 추론을 갖춘 다중 에이전트 시스템

초록

심층 연구(Deep Research, DR)는 복잡하고 개방적인 연구 과제를 해결하기 위한 새로운 에이전트 패러다임으로 부상했으며, 문제를 반복적으로 구성하고, 증거를 획득하며, 출처를 검증하고, 장문의 보고서를 종합할 수 있는 시스템을 요구한다. 그러나 실제로 현재의 DR 시스템은 네 가지 상호 연관된 한계, 즉 범위가 불충분하게 지정된 장기적 계획, 단일 에이전트 내에서 이러한 작업을 분해하고 스케줄링할 때의 병목 현상, 장문 종합 과정에서의 할루시네이션 위험, 제한된 프로세스 감사 가능성에 의해 제약을 받는다. 본 기술 보고서는 Qianfan Agent Foundry를 기반으로 구축된 다중 에이전트 DR 프레임워크인 DuMate-DeepResearch를 제시한다. 이 프레임워크는 작업 이해, 계획 및 스케줄링을 담당하는 Agent Core와 검색, 증거 획득 및 보고서 렌더링을 위한 확장 가능한 도구 생태계(Tool Ecosystem)를 분리하여, 모든 중간 결정과 도구 호출을 명시적으로 추적 가능하게 만든다. 이 인프라를 기반으로 DuMate-DeepResearch는 세 가지 메커니즘을 추가로 도입한다: (i) 그래프 기반 동적 계획 전략은 연구 로드맵을 대략에서 세부로 확장하고, 반성(reflection), 재계획(re-planning), 역추적(backtracking) 및 병렬 분기(parallel branching)를 통해 지속적으로 수정한다; (ii) 재귀적 이중 수준 실행 설계는 각 복잡한 검색 하위 작업을 자체 계획 루프를 실행하는 내부 검색 에이전트(Search Agent)에 위임하여, 노이즈가 많은 검색을 격리하고 장기 실행을 안정화한다; (iii) 루브릭 기반 테스트 시간 최적화 메커니즘은 작업별 품질 기준을 동적으로 생성하고, 이를 증거 기반 종합 및 적응형 중단을 위한 실시간 추론 비계(live reasoning scaffold)로 사용한다. 두 가지 심층 연구 벤치마크에서 DuMate-DeepResearch는 새로운 최첨단 결과를 달성했다: DeepResearch Bench에서 최고 종합 점수(58.03%), DeepResearch Bench II에서 최고 종합 점수(61.95%)를 기록했으며, 정보 검색 및 분석 부문에서 1위를 차지했다.

English

Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice, however, current DR systems are constrained by four interrelated limitations: long-horizon planning over an underspecified scope, the bottleneck of decomposing and scheduling such tasks within a single agent, hallucination risk in long-form synthesis, and limited process auditability. This technical report presents DuMate-DeepResearch, a multi-agent DR framework built on the Qianfan Agent Foundry. The framework decouples the Agent Core, which handles task understanding, planning, and scheduling, from an extensible Tool Ecosystem for retrieval, evidence acquisition, and report rendering, making every intermediate decision and tool invocation explicitly traceable. Building on this infrastructure, DuMate-DeepResearch further introduces three mechanisms: (i) a graph-based dynamic planning strategy expands the research roadmap coarse-to-fine and continuously revises it through reflection, re-planning, backtracking, and parallel branching; (ii) a recursive two-level execution design delegates each complex search sub-task to an inner Search Agent that runs its own planning loop, isolating noisy retrieval and stabilizing long-horizon execution; (iii) a rubric-based test-time optimization mechanism dynamically generates task-specific quality criteria and uses them as live reasoning scaffolds for evidence-grounded synthesis and adaptive stopping. Across two deep research benchmarks, DuMate-DeepResearch establishes new state-of-the-art results: the best overall score (58.03%) on DeepResearch Bench, and the best overall score (61.95%) on DeepResearch Bench II while ranking first in information recall and analysis.