LLMSQL：為文本轉SQL的LLM時代升級WikiSQL

摘要

將自然語言問題轉換為SQL查詢（Text-to-SQL）使非專業用戶能夠與關聯式數據庫互動，這一直是數據自然語言介面的核心任務。儘管WikiSQL數據集在早期的NL2SQL研究中扮演了關鍵角色，但由於結構和註釋問題，包括大小寫敏感性不一致、數據類型不匹配、語法錯誤和未回答的問題，其使用率已下降。我們提出了LLMSQL，這是為LLM時代設計的WikiSQL系統性修訂和轉換版本。我們對這些錯誤進行了分類，並實施了自動化的清理和重新註釋方法。為了評估這些改進的影響，我們評估了多個大型語言模型（LLMs），包括Gemma 3、LLaMA 3.2、Mistral 7B、gpt-oss 20B、Phi-3.5 Mini、Qwen 2.5、OpenAI o4-mini、DeepSeek R1等。LLMSQL並非作為更新版本推出，而是作為一個LLM-ready的基準測試：與原始WikiSQL不同，後者專為從輸入中選擇標記的指針網絡模型設計，LLMSQL提供了乾淨的自然語言問題和完整的SQL查詢作為純文本，使現代自然語言到SQL模型的生成和評估變得直接。

English

Converting natural language questions into SQL queries (Text-to-SQL) enables non-expert users to interact with relational databases and has long been a central task for natural language interfaces to data. While the WikiSQL dataset played a key role in early NL2SQL research, its usage has declined due to structural and annotation issues, including case sensitivity inconsistencies, data type mismatches, syntax errors, and unanswered questions. We present LLMSQL, a systematic revision and transformation of WikiSQL designed for the LLM era. We classify these errors and implement automated methods for cleaning and re-annotation. To assess the impact of these improvements, we evaluated multiple large language models (LLMs), including Gemma 3, LLaMA 3.2, Mistral 7B, gpt-oss 20B, Phi-3.5 Mini, Qwen 2.5, OpenAI o4-mini, DeepSeek R1 and others. Rather than serving as an update, LLMSQL is introduced as an LLM-ready benchmark: unlike the original WikiSQL, tailored for pointer-network models selecting tokens from input, LLMSQL provides clean natural language questions and full SQL queries as plain text, enabling straightforward generation and evaluation for modern natural language-to-SQL models.

LLMSQL：為文本轉SQL的LLM時代升級WikiSQL

LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL

摘要

Support