テスト時拡散を伴う深層研究者

要旨

大規模言語モデル（LLMs）を基盤とするディープリサーチエージェントは急速に進化しているが、汎用的なテストタイムスケーリングアルゴリズムを用いて複雑で長文の研究レポートを生成する際に、その性能が頭打ちになることが多い。人間の研究プロセスが検索、推論、修正のサイクルを繰り返すという性質に着想を得て、我々は「Test-Time Diffusion Deep Researcher（TTD-DR）」を提案する。この新しいフレームワークは、研究レポートの生成を拡散プロセスとして概念化する。TTD-DRは、研究の方向性を導く進化する基盤として機能する更新可能なスケルトンである初期ドラフトからこのプロセスを開始する。その後、ドラフトは各ステップで外部情報を取り込む検索メカニズムによって動的に情報が提供される「ノイズ除去」プロセスを通じて反復的に洗練される。さらに、エージェントのワークフローの各コンポーネントに自己進化アルゴリズムを適用することで、拡散プロセスのための高品質なコンテキスト生成が保証される。このドラフト中心の設計により、レポート作成プロセスがよりタイムリーで一貫性のあるものとなり、反復的な検索プロセス中の情報損失が軽減される。我々は、TTD-DRが集中的な検索とマルチホップ推論を必要とする幅広いベンチマークにおいて、既存のディープリサーチエージェントを大幅に上回る最先端の結果を達成することを実証する。

English

Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design makes the report writing process more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents.

テスト時拡散を伴う深層研究者

Deep Researcher with Test-Time Diffusion

要旨

Support