TRUST-SQL: 未知スキーマに対するテキスト-to-SQLのためのツール統合型マルチターン強化学習

要旨

Text-to-SQL解析は、完全スキーマ仮定の下で著しい進歩を遂げてきた。しかし、実際の企業環境では、データベースに数百のテーブルと大量のノイズを含むメタデータが存在するため、この前提は成り立たない。スキーマ全体を事前に投入するのではなく、エージェントは関連するサブセットのみを能動的に特定・検証する必要があり、本研究で取り上げる未知スキーマシナリオが生まれる。この問題に対処するため、我々はTRUST-SQL（ツールによる未知スキーマへの忠実な推論）を提案する。本タスクを部分観測マルコフ決定過程として定式化し、自律エージェントが構造化された4段階プロトコルを用いて検証済みメタデータに基づく推論を接地する。特に、このプロトコルは新たなDual-Track GRPO戦略の構造的基盤を提供する。トークンレベルのマスク化アドバンテージを適用することで、信用割り当て問題を解決するために探索報酬を実行結果から分離し、標準GRPOよりも9.9%の相対改善を実現した。5つのベンチマークによる大規模実験では、TRUST-SQLが4Bおよび8Bモデルにおいて、ベースモデル比でそれぞれ平均30.6%、16.6%の絶対改善を達成することが示された。驚くべきことに、メタデータを一切事前読み込みせずに動作するにもかかわらず、本フレームワークはスキーマ事前投入に依存する強力なベースラインを常に匹敵または上回る性能を発揮した。

English

Text-to-SQL parsing has achieved remarkable progress under the Full Schema Assumption. However, this premise fails in real-world enterprise environments where databases contain hundreds of tables with massive noisy metadata. Rather than injecting the full schema upfront, an agent must actively identify and verify only the relevant subset, giving rise to the Unknown Schema scenario we study in this work. To address this, we propose TRUST-SQL (Truthful Reasoning with Unknown Schema via Tools). We formulate the task as a Partially Observable Markov Decision Process where our autonomous agent employs a structured four-phase protocol to ground reasoning in verified metadata. Crucially, this protocol provides a structural boundary for our novel Dual-Track GRPO strategy. By applying token-level masked advantages, this strategy isolates exploration rewards from execution outcomes to resolve credit assignment, yielding a 9.9% relative improvement over standard GRPO. Extensive experiments across five benchmarks demonstrate that TRUST-SQL achieves an average absolute improvement of 30.6% and 16.6% for the 4B and 8B variants respectively over their base models. Remarkably, despite operating entirely without pre-loaded metadata, our framework consistently matches or surpasses strong baselines that rely on schema prefilling.

TRUST-SQL: 未知スキーマに対するテキスト-to-SQLのためのツール統合型マルチターン強化学習

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

要旨

Support