LLMSQL:为文本到SQL的大语言模型时代升级WikiSQL
LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL
September 27, 2025
作者: Dzmitry Pihulski, Karol Charchut, Viktoria Novogrodskaia, Jan Kocoń
cs.AI
摘要
将自然语言问题转换为SQL查询(Text-to-SQL)使非专业用户能够与关系数据库交互,这一直是数据自然语言接口的核心任务。尽管WikiSQL数据集在早期NL2SQL研究中发挥了关键作用,但由于结构和标注问题,包括大小写敏感性不一致、数据类型不匹配、语法错误和未回答问题,其使用率已下降。我们提出了LLMSQL,这是为LLM时代设计的WikiSQL的系统性修订和转换版本。我们对这些错误进行了分类,并实施了自动化的清洗和重新标注方法。为了评估这些改进的影响,我们评估了多个大型语言模型(LLMs),包括Gemma 3、LLaMA 3.2、Mistral 7B、gpt-oss 20B、Phi-3.5 Mini、Qwen 2.5、OpenAI o4-mini、DeepSeek R1等。LLMSQL并非作为更新版本推出,而是作为一个LLM就绪的基准:与最初为指针网络模型从输入中选择令牌而设计的WikiSQL不同,LLMSQL提供了干净的自然语言问题和完整的SQL查询作为纯文本,使得现代自然语言到SQL模型的生成和评估变得直接明了。
English
Converting natural language questions into SQL queries (Text-to-SQL) enables
non-expert users to interact with relational databases and has long been a
central task for natural language interfaces to data. While the WikiSQL dataset
played a key role in early NL2SQL research, its usage has declined due to
structural and annotation issues, including case sensitivity inconsistencies,
data type mismatches, syntax errors, and unanswered questions. We present
LLMSQL, a systematic revision and transformation of WikiSQL designed for the
LLM era. We classify these errors and implement automated methods for cleaning
and re-annotation. To assess the impact of these improvements, we evaluated
multiple large language models (LLMs), including Gemma 3, LLaMA 3.2, Mistral
7B, gpt-oss 20B, Phi-3.5 Mini, Qwen 2.5, OpenAI o4-mini, DeepSeek R1 and
others. Rather than serving as an update, LLMSQL is introduced as an LLM-ready
benchmark: unlike the original WikiSQL, tailored for pointer-network models
selecting tokens from input, LLMSQL provides clean natural language questions
and full SQL queries as plain text, enabling straightforward generation and
evaluation for modern natural language-to-SQL models.