R^3-SQL：面向文本到SQL的排序奖励与重采样

摘要

现代文本到SQL系统生成多个候选SQL查询并对其排序以确定最终预测结果。然而，现有方法存在两个局限性。首先，对于功能等价的SQL查询，即使执行结果完全相同，其评分也往往不一致。其次，当候选池中缺失正确的SQL语句时，排序策略无法恢复正确结果。我们提出R^3-SQL框架，通过统一奖励机制同时解决排序与重采样这两个问题。R^3-SQL首先根据执行结果对候选查询进行分组，并对组别进行一致性排序。为评估每组质量，该方法融合了跨组的成对偏好与最优组排名、组规模所体现的点态效用，从而捕捉相对偏好、一致性与候选质量。为提升候选召回率，R^3-SQL引入智能体重采样机制：评估已生成的候选池，并在正确SQL可能缺失时选择性重采样。R^3-SQL在BIRD-dev上达到75.03%的执行准确率，成为采用公开规模模型方法中的新标杆，且在五个基准测试中均取得稳定性能提升。

English

Modern Text-to-SQL systems generate multiple candidate SQL queries and rank them to judge a final prediction. However, existing methods face two limitations. First, they often score functionally equivalent SQL queries inconsistently despite identical execution results. Second, ranking cannot recover when the correct SQL is absent from the candidate pool. We propose R^3-SQL, a Text-to-SQL framework that addresses both issues through unified reward for ranking and resampling. R^3-SQL first groups candidates by execution result and ranks groups for consistency. To score each group, it combines a pairwise preference across groups with a pointwise utility from the best group rank and size, capturing relative preference, consistency, and candidate quality. To improve candidate recall, R^3-SQL introduces agentic resampling, which judges the generated candidate pool and selectively resamples when the correct SQL is likely absent. R^3-SQL achieves 75.03 execution accuracy on BIRD-dev, a new state of the art among methods using models with disclosed sizes, with consistent gains across five benchmarks.

R^3-SQL：面向文本到SQL的排序奖励与重采样

R^3-SQL: Ranking Reward and Resampling for Text-to-SQL

摘要

Support