I contesti non sono mai abbastanza lunghi: ragionamento strutturato per question answering scalabile su insiemi di documenti estesi

Abstract

La risposta a domande su documenti del mondo reale è una sfida complessa. Gli analisti devono sintetizzare le evidenze provenienti da documenti multipli e da diverse parti di ciascun documento. Tuttavia, qualsiasi finestra di contesto fissa di un LLM può essere superata man mano che le raccolte di documenti crescono. Una soluzione comune consiste nel scomporre i documenti in segmenti (chunk) e assemblare le risposte dagli output a livello di segmento, ma questo introduce un collo di bottiglia nell'aggregazione: all'aumentare del numero di segmenti, i sistemi devono comunque combinare e ragionare su un corpus di evidenze estratte sempre più vasto. Presentiamo SLIDERS, un framework per la risposta a domande su lunghe raccolte di documenti attraverso un ragionamento strutturato. SLIDERS estrae le informazioni salienti in un database relazionale, permettendo un ragionamento scalabile su uno stato strutturato persistente tramite SQL piuttosto che su testo concatenato. Per rendere questa rappresentazione estratta localmente coerente a livello globale, SLIDERS introduce una fase di riconciliazione dei dati che sfrutta la provenienza, le motivazioni dell'estrazione e i metadati per rilevare e correggere record duplicati, inconsistenti e incompleti. SLIDERS supera tutti i baseline su tre benchmark esistenti per contesti lunghi, nonostante tutti rientrino nella finestra di contesto di LLM base potenti, superando GPT-4.1 di 6,6 punti in media. Migliora inoltre di circa 19 e 32 punti rispetto al baseline successivo migliore su due nuovi benchmark rispettivamente a 3,9 milioni e 36 milioni di token.

English

Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from chunk-level outputs, but this introduces an aggregation bottleneck: as the number of chunks grows, systems must still combine and reason over an increasingly large body of extracted evidence. We present SLIDERS, a framework for question answering over long document collections through structured reasoning. SLIDERS extracts salient information into a relational database, enabling scalable reasoning over persistent structured state via SQL rather than concatenated text. To make this locally extracted representation globally coherent, SLIDERS introduces a data reconciliation stage that leverages provenance, extraction rationales, and metadata to detect and repair duplicated, inconsistent, and incomplete records. SLIDERS outperforms all baselines on three existing long-context benchmarks, despite all of them fitting within the context window of strong base LLMs, exceeding GPT-4.1 by 6.6 points on average. It also improves over the next best baseline by ~19 and ~32 points on two new benchmarks at 3.9M and 36M tokens, respectively.

I contesti non sono mai abbastanza lunghi: ragionamento strutturato per question answering scalabile su insiemi di documenti estesi

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Abstract

Support