ChatPaper.aiChatPaper

知識增強型文本至SQL轉換的知識庫構建

Knowledge Base Construction for Knowledge-Augmented Text-to-SQL

May 28, 2025
作者: Jinheon Baek, Horst Samulowitz, Oktie Hassanzadeh, Dharmashankar Subramanian, Sola Shirai, Alfio Gliozzo, Debarun Bhattacharjya
cs.AI

摘要

文本至SQL轉換旨在將自然語言查詢翻譯成SQL語句,此技術實用性強,因其使任何人都能輕鬆從數據庫中檢索所需信息。近年來,許多現有方法利用大型語言模型(LLMs)來解決這一問題,依賴於其在理解用戶查詢及生成相應SQL代碼方面的強大能力。然而,LLMs中的參數化知識可能不足以涵蓋所有多樣化且領域特定的查詢,這些查詢往往需要基於多種數據庫模式進行基礎,這導致生成的SQL語句時常不夠精確。為解決此問題,我們提出構建文本至SQL的知識庫,作為基礎知識來源,從中檢索並生成針對特定查詢所需的知識。特別地,與現有方法相比,無論是手動標註知識還是僅為每個查詢生成少量知識,我們的知識庫均更為全面,其基於所有可用問題及其相關數據庫模式與關聯知識的組合構建,並可重複應用於來自不同數據集和領域的未知數據庫。我們在多個文本至SQL數據集上驗證了我們的方法,考慮了數據庫重疊與非重疊兩種場景,結果顯示該方法顯著優於相關基線。
English
Text-to-SQL aims to translate natural language queries into SQL statements, which is practical as it enables anyone to easily retrieve the desired information from databases. Recently, many existing approaches tackle this problem with Large Language Models (LLMs), leveraging their strong capability in understanding user queries and generating corresponding SQL code. Yet, the parametric knowledge in LLMs might be limited to covering all the diverse and domain-specific queries that require grounding in various database schemas, which makes generated SQLs less accurate oftentimes. To tackle this, we propose constructing the knowledge base for text-to-SQL, a foundational source of knowledge, from which we retrieve and generate the necessary knowledge for given queries. In particular, unlike existing approaches that either manually annotate knowledge or generate only a few pieces of knowledge for each query, our knowledge base is comprehensive, which is constructed based on a combination of all the available questions and their associated database schemas along with their relevant knowledge, and can be reused for unseen databases from different datasets and domains. We validate our approach on multiple text-to-SQL datasets, considering both the overlapping and non-overlapping database scenarios, where it outperforms relevant baselines substantially.

Summary

AI-Generated Summary

PDF11May 29, 2025