TRUST-SQL: Apprendimento per Rinforzo Multi-Turn Integrato con Strumenti per Text-to-SQL su Schemi Sconosciuti

Abstract

Il parsing Text-to-SQL ha compiuto progressi notevoli nell'ambito del Full Schema Assumption. Tuttavia, questo presupposto risulta inadeguato negli ambienti aziendali reali, dove i database contengono centinaia di tabelle con metadati massicci e rumorosi. Invece di iniettare l'intero schema a priori, un agente deve identificare e verificare attivamente solo il sottoinsieme rilevante, dando origine allo scenario Schema Sconosciuto che studiamo in questo lavoro. Per affrontare questo problema, proponiamo TRUST-SQL (Truthful Reasoning with Unknown Schema via Tools). Formuliamo il compito come un Processo Decisionale di Markov Parzialmente Osservabile, in cui il nostro agente autonomo utilizza un protocollo strutturato in quattro fasi per ancorare il ragionamento a metadati verificati. Crucialmente, questo protocollo fornisce un confine strutturale per la nostra innovativa strategia Dual-Track GRPO. Applicando vantaggi mascherati a livello di token, questa strategia isola le ricompense dell'esplorazione dagli esiti dell'esecuzione per risolvere il problema dell'assegnazione del credito, producendo un miglioramento relativo del 9,9% rispetto al GRPO standard. Esperimenti estesi su cinque benchmark dimostrano che TRUST-SQL raggiunge un miglioramento assoluto medio del 30,6% e del 16,6% rispettivamente per le varianti da 4B e 8B rispetto ai loro modelli base. Notevolmente, nonostante operi completamente senza metadati precaricati, la nostra architettura eguaglia o supera costantemente baseline robuste che si basano sul pre-riempimento dello schema.

English

Text-to-SQL parsing has achieved remarkable progress under the Full Schema Assumption. However, this premise fails in real-world enterprise environments where databases contain hundreds of tables with massive noisy metadata. Rather than injecting the full schema upfront, an agent must actively identify and verify only the relevant subset, giving rise to the Unknown Schema scenario we study in this work. To address this, we propose TRUST-SQL (Truthful Reasoning with Unknown Schema via Tools). We formulate the task as a Partially Observable Markov Decision Process where our autonomous agent employs a structured four-phase protocol to ground reasoning in verified metadata. Crucially, this protocol provides a structural boundary for our novel Dual-Track GRPO strategy. By applying token-level masked advantages, this strategy isolates exploration rewards from execution outcomes to resolve credit assignment, yielding a 9.9% relative improvement over standard GRPO. Extensive experiments across five benchmarks demonstrate that TRUST-SQL achieves an average absolute improvement of 30.6% and 16.6% for the 4B and 8B variants respectively over their base models. Remarkably, despite operating entirely without pre-loaded metadata, our framework consistently matches or surpasses strong baselines that rely on schema prefilling.

TRUST-SQL: Apprendimento per Rinforzo Multi-Turn Integrato con Strumenti per Text-to-SQL su Schemi Sconosciuti

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Abstract

Support