SQL-R1：透過強化學習訓練自然語言到SQL推理模型

摘要

自然語言轉SQL（NL2SQL）技術通過將自然語言查詢轉化為結構化的SQL語句，實現了與數據庫的直觀交互。儘管近年來在增強數據庫應用中的人機交互方面取得了進展，但在涉及多表連接和嵌套查詢的複雜場景中，推理性能仍面臨重大挑戰。現有方法主要依賴於監督微調（SFT）來訓練NL2SQL模型，這可能限制模型在新環境（如金融和醫療領域）中的適應性和可解釋性。為提升NL2SQL模型在上述複雜情境下的推理性能，我們引入了SQL-R1，這是一種基於強化學習（RL）算法訓練的新型NL2SQL推理模型。我們設計了專門針對NL2SQL任務的RL獎勵函數，並探討了冷啟動對密集訓練效果的影響。此外，我們僅使用少量合成的NL2SQL數據進行增強訓練，便達到了具有競爭力的準確率，並進一步探索了RL的數據工程。在現有實驗中，SQL-R1僅使用7B基礎模型，在基準測試Spider和BIRD上分別實現了88.6%和66.6%的執行準確率。

English

Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the inference performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning (SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments (e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning (RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6% and 66.6% on the benchmark Spider and BIRD, respectively, only using the 7B base model.

SQL-R1：透過強化學習訓練自然語言到SQL推理模型

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

摘要

Support