LLMSQL: テキストからSQLへのLLM時代におけるWikiSQLの進化

要旨

自然言語の質問をSQLクエリに変換する（Text-to-SQL）技術は、非専門家のユーザーがリレーショナルデータベースと対話することを可能にし、データに対する自然言語インターフェースの中心的な課題として長く研究されてきた。WikiSQLデータセットは初期のNL2SQL研究において重要な役割を果たしたが、大文字小文字の不整合、データ型の不一致、構文エラー、未回答の質問などの構造的およびアノテーションの問題により、その使用は減少している。本論文では、LLM時代に適したWikiSQLの体系的な改訂および変換であるLLMSQLを提案する。これらのエラーを分類し、自動化された方法でクリーニングおよび再アノテーションを実施した。これらの改善の影響を評価するため、Gemma 3、LLaMA 3.2、Mistral 7B、gpt-oss 20B、Phi-3.5 Mini、Qwen 2.5、OpenAI o4-mini、DeepSeek R1などの複数の大規模言語モデル（LLM）を評価した。LLMSQLは更新版としてではなく、LLM対応のベンチマークとして導入されている。元のWikiSQLが入力からトークンを選択するポインタネットワークモデル向けに設計されていたのに対し、LLMSQLはクリーンな自然言語の質問と完全なSQLクエリをプレーンテキストとして提供し、現代の自然言語からSQLへのモデルに対して直接的な生成と評価を可能にする。

English

Converting natural language questions into SQL queries (Text-to-SQL) enables non-expert users to interact with relational databases and has long been a central task for natural language interfaces to data. While the WikiSQL dataset played a key role in early NL2SQL research, its usage has declined due to structural and annotation issues, including case sensitivity inconsistencies, data type mismatches, syntax errors, and unanswered questions. We present LLMSQL, a systematic revision and transformation of WikiSQL designed for the LLM era. We classify these errors and implement automated methods for cleaning and re-annotation. To assess the impact of these improvements, we evaluated multiple large language models (LLMs), including Gemma 3, LLaMA 3.2, Mistral 7B, gpt-oss 20B, Phi-3.5 Mini, Qwen 2.5, OpenAI o4-mini, DeepSeek R1 and others. Rather than serving as an update, LLMSQL is introduced as an LLM-ready benchmark: unlike the original WikiSQL, tailored for pointer-network models selecting tokens from input, LLMSQL provides clean natural language questions and full SQL queries as plain text, enabling straightforward generation and evaluation for modern natural language-to-SQL models.

LLMSQL: テキストからSQLへのLLM時代におけるWikiSQLの進化

LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL

要旨

Support