FS-Researcher: Scalabilità in Fase di Test per Attività di Ricerca a Lungo Orizzonte con Agenti Basati su File System

Abstract

La ricerca approfondita sta emergendo come un compito rappresentativo a lungo orizzonte per gli agenti basati su grandi modelli linguistici (LLM). Tuttavia, le traiettorie lunghe nella ricerca approfondita spesso superano i limiti del contesto del modello, comprimendo il budget di token sia per la raccolta di evidenze che per la scrittura di report, e impedendo un efficace scaling al momento del test. Introduciamo FS-Researcher, un framework duale-agente basato su file system che scala la ricerca approfondita oltre la finestra di contesto tramite un workspace persistente. Nello specifico, un agente "Costruttore del Contesto" funge da bibliotecario, navigando in internet, scrivendo appunti strutturati e archiviando le fonti grezze in una base di conoscenza gerarchica che può crescere ben oltre la lunghezza del contesto. Un agente "Scrittore di Report" compone poi il report finale sezione per sezione, trattando la base di conoscenza come fonte dei fatti. In questo framework, il file system funge da memoria esterna duratura e da mezzo di coordinamento condiviso tra agenti e sessioni, consentendo una raffinazione iterativa oltre la finestra di contesto. Esperimenti su due benchmark aperti (DeepResearch Bench e DeepConsult) mostrano che FS-Researcher raggiunge una qualità del report allo stato dell'arte attraverso diversi modelli di base. Ulteriori analisi dimostrano una correlazione positiva tra la qualità del report finale e la potenza di calcolo allocata al Costruttore del Contesto, validando uno scaling efficace al momento del test sotto il paradigma del file system. Il codice e i dati sono open-source anonimi all'indirizzo https://github.com/Ignoramus0817/FS-Researcher.

English

Deep research is emerging as a representative long-horizon task for large language model (LLM) agents. However, long trajectories in deep research often exceed model context limits, compressing token budgets for both evidence collection and report writing, and preventing effective test-time scaling. We introduce FS-Researcher, a file-system-based, dual-agent framework that scales deep research beyond the context window via a persistent workspace. Specifically, a Context Builder agent acts as a librarian which browses the internet, writes structured notes, and archives raw sources into a hierarchical knowledge base that can grow far beyond context length. A Report Writer agent then composes the final report section by section, treating the knowledge base as the source of facts. In this framework, the file system serves as a durable external memory and a shared coordination medium across agents and sessions, enabling iterative refinement beyond the context window. Experiments on two open-ended benchmarks (DeepResearch Bench and DeepConsult) show that FS-Researcher achieves state-of-the-art report quality across different backbone models. Further analyses demonstrate a positive correlation between final report quality and the computation allocated to the Context Builder, validating effective test-time scaling under the file-system paradigm. The code and data are anonymously open-sourced at https://github.com/Ignoramus0817/FS-Researcher.

FS-Researcher: Scalabilità in Fase di Test per Attività di Ricerca a Lungo Orizzonte con Agenti Basati su File System

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Abstract

Support