R^3-SQL: 텍스트-투-SQL을 위한 순위 보상 및 재샘플링

초록

현대의 Text-to-SQL 시스템은 여러 후보 SQL 쿼리를 생성하고 순위를 매겨 최종 예측을 판단한다. 그러나 기존 방법은 두 가지 한계를 가진다. 첫째, 동일한 실행 결과에도 불구하고 기능적으로 동등한 SQL 쿼리에 대해 일관성 없이 점수를 부여하는 경우가 많다. 둘째, 올바른 SQL이 후보 풀에 없을 경우 순위 매기기로는 복구할 수 없다. 우리는 순위 매기기와 재표본추출을 위한 통합 보상을 통해 두 문제를 모두 해결하는 Text-to-SQL 프레임워크인 R^3-SQL을 제안한다. R^3-SQL은 먼저 실행 결과에 따라 후보를 그룹화하고 일관성을 위해 그룹의 순위를 매긴다. 각 그룹의 점수를 매기기 위해, 그룹 간 쌍별 선호도와 최상위 그룹 순위 및 크기로부터의 점별 효용을 결합하여 상대적 선호도, 일관성 및 후보 품질을 포착한다. 후보 재현율을 향상시키기 위해 R^3-SQL은 에이전트 기반 재표본추출을 도입하는데, 이는 생성된 후보 풀을 판단하고 올바른 SQL이 없을 가능성이 있을 때 선택적으로 재표본추출한다. R^3-SQL은 BIRD-dev에서 75.03의 실행 정확도를 달성하여 공개된 크기의 모델을 사용하는 방법 중 최첨단 성능을 기록했으며, 다섯 개의 벤치마크에서 일관된 향상을 보인다.

English

Modern Text-to-SQL systems generate multiple candidate SQL queries and rank them to judge a final prediction. However, existing methods face two limitations. First, they often score functionally equivalent SQL queries inconsistently despite identical execution results. Second, ranking cannot recover when the correct SQL is absent from the candidate pool. We propose R^3-SQL, a Text-to-SQL framework that addresses both issues through unified reward for ranking and resampling. R^3-SQL first groups candidates by execution result and ranks groups for consistency. To score each group, it combines a pairwise preference across groups with a pointwise utility from the best group rank and size, capturing relative preference, consistency, and candidate quality. To improve candidate recall, R^3-SQL introduces agentic resampling, which judges the generated candidate pool and selectively resamples when the correct SQL is likely absent. R^3-SQL achieves 75.03 execution accuracy on BIRD-dev, a new state of the art among methods using models with disclosed sizes, with consistent gains across five benchmarks.

R^3-SQL: 텍스트-투-SQL을 위한 순위 보상 및 재샘플링

R^3-SQL: Ranking Reward and Resampling for Text-to-SQL

초록

Support