SQL-R1: 強化学習による自然言語からSQLへの推論モデルのトレーニング

要旨

自然言語からSQLへの変換（NL2SQL）は、自然言語クエリを構造化されたSQL文に変換することで、データベースとの直感的なインタラクションを可能にします。データベースアプリケーションにおける人間とコンピュータの相互作用を強化するための最近の進展にもかかわらず、特に複数のテーブル結合やネストされたクエリを含む複雑なシナリオにおける推論性能に関して、重要な課題が残っています。現在の手法は主に教師あり微調整（SFT）を利用してNL2SQLモデルを訓練しており、これが新しい環境（例えば、金融や医療）での適応性と解釈可能性を制限する可能性があります。上記の複雑な状況においてNL2SQLモデルの推論性能を向上させるために、我々は強化学習（RL）アルゴリズムによって訓練された新しいNL2SQL推論モデルであるSQL-R1を導入します。我々はNL2SQLタスクに特化したRLベースの報酬関数を設計し、集中訓練の効果に対するコールドスタートの影響について議論しました。さらに、合成NL2SQLデータのごく少量のみを使用して拡張訓練を行い、競争力のある精度を達成し、RLのためのデータエンジニアリングをさらに探求しました。既存の実験では、SQL-R1は7Bベースモデルのみを使用して、ベンチマークSpiderとBIRDでそれぞれ88.6%と66.6%の実行精度を達成しました。

English

Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the inference performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning (SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments (e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning (RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6% and 66.6% on the benchmark Spider and BIRD, respectively, only using the 7B base model.