ChatPaper.aiChatPaper

FS-Researcher:基於文件系統智能體的長週期研究任務測試時擴展方案 (注:此處採用"測試時擴展"對應"Test-Time Scaling"的技術概念,強調在測試階段動態調整模型規模的能力;"文件系統智能體"準確傳達"File-System-Based Agents"的技術特徵;"長週期研究任務"精準對應"Long-Horizon Research Tasks"這一AI研究領域專有術語)

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

February 2, 2026
作者: Chiwei Zhu, Benfeng Xu, Mingxuan Du, Shaohan Wang, Xiaorui Wang, Zhendong Mao, Yongdong Zhang
cs.AI

摘要

深度研究正逐漸成為大型語言模型(LLM)代理的典型長週期任務。然而,深度研究中的冗長軌跡常超出模型上下文限制,壓縮了證據收集與報告撰寫的令牌預算,並阻礙有效的測試時擴展。我們提出FS-Researcher——一個基於檔案系統的雙代理框架,通過持久化工作區將深度研究擴展至上下文視窗之外。具體而言,Context Builder代理扮演圖書館員角色,負責瀏覽網際網路、撰寫結構化筆記,並將原始資料歸檔至可遠超上下文長度的階層化知識庫中。隨後,Report Writer代理以分節方式撰寫最終報告,將知識庫視為事實來源。在此框架下,檔案系統作為跨代理與會話的持久化外部記憶體及共享協調媒介,實現超越上下文視窗的迭代優化。在兩項開放式基準測試(DeepResearch Bench與DeepConsult)上的實驗表明,FS-Researcher在不同骨幹模型下均實現了最先進的報告品質。進一步分析顯示,最終報告品質與分配給Context Builder的計算資源呈正相關,驗證了檔案系統範式下測試時擴展的有效性。程式碼與資料已匿名開源於:https://github.com/Ignoramus0817/FS-Researcher。
English
Deep research is emerging as a representative long-horizon task for large language model (LLM) agents. However, long trajectories in deep research often exceed model context limits, compressing token budgets for both evidence collection and report writing, and preventing effective test-time scaling. We introduce FS-Researcher, a file-system-based, dual-agent framework that scales deep research beyond the context window via a persistent workspace. Specifically, a Context Builder agent acts as a librarian which browses the internet, writes structured notes, and archives raw sources into a hierarchical knowledge base that can grow far beyond context length. A Report Writer agent then composes the final report section by section, treating the knowledge base as the source of facts. In this framework, the file system serves as a durable external memory and a shared coordination medium across agents and sessions, enabling iterative refinement beyond the context window. Experiments on two open-ended benchmarks (DeepResearch Bench and DeepConsult) show that FS-Researcher achieves state-of-the-art report quality across different backbone models. Further analyses demonstrate a positive correlation between final report quality and the computation allocated to the Context Builder, validating effective test-time scaling under the file-system paradigm. The code and data are anonymously open-sourced at https://github.com/Ignoramus0817/FS-Researcher.
PDF432February 4, 2026