具備測試時擴散能力的深度研究員
Deep Researcher with Test-Time Diffusion
July 21, 2025
作者: Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan, Chunfeng Wen, Solène Maître, George Lee, Vishy Tirumalashetty, Emily Xue, Zizhao Zhang, Salem Haykal, Burak Gokturk, Tomas Pfister, Chen-Yu Lee
cs.AI
摘要
基於大型語言模型(LLMs)驅動的深度研究代理正迅速發展;然而,在使用通用的測試時擴展算法生成複雜的長篇研究報告時,其性能往往會達到瓶頸。受人類研究中迭代性質的啟發,即包含搜索、推理和修訂的循環,我們提出了測試時擴散深度研究員(TTD-DR)。這一新穎框架將研究報告的生成概念化為一個擴散過程。TTD-DR從初步草稿開始這一過程,這是一個可更新的骨架,作為引導研究方向的演進基礎。草稿隨後通過“去噪”過程進行迭代精煉,該過程在每一步都動態地由檢索機制提供外部信息。核心過程進一步通過應用於代理工作流程每個組件的自進化算法得到增強,確保為擴散過程生成高質量的上下文。這種以草稿為中心的設計使報告撰寫過程更加及時和連貫,同時減少了迭代搜索過程中的信息損失。我們展示了TTD-DR在需要密集搜索和多跳推理的廣泛基準測試中取得了最先進的成果,顯著超越了現有的深度研究代理。
English
Deep research agents, powered by Large Language Models (LLMs), are rapidly
advancing; yet, their performance often plateaus when generating complex,
long-form research reports using generic test-time scaling algorithms. Drawing
inspiration from the iterative nature of human research, which involves cycles
of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep
Researcher (TTD-DR). This novel framework conceptualizes research report
generation as a diffusion process. TTD-DR initiates this process with a
preliminary draft, an updatable skeleton that serves as an evolving foundation
to guide the research direction. The draft is then iteratively refined through
a "denoising" process, which is dynamically informed by a retrieval mechanism
that incorporates external information at each step. The core process is
further enhanced by a self-evolutionary algorithm applied to each component of
the agentic workflow, ensuring the generation of high-quality context for the
diffusion process. This draft-centric design makes the report writing process
more timely and coherent while reducing information loss during the iterative
search process. We demonstrate that our TTD-DR achieves state-of-the-art
results on a wide array of benchmarks that require intensive search and
multi-hop reasoning, significantly outperforming existing deep research agents.