ChatPaper.aiChatPaper

面向知识增强型文本到SQL的知识库构建

Knowledge Base Construction for Knowledge-Augmented Text-to-SQL

May 28, 2025
作者: Jinheon Baek, Horst Samulowitz, Oktie Hassanzadeh, Dharmashankar Subramanian, Sola Shirai, Alfio Gliozzo, Debarun Bhattacharjya
cs.AI

摘要

文本到SQL(Text-to-SQL)旨在将自然语言查询转换为SQL语句,这一技术极具实用性,因为它使得任何人都能轻松地从数据库中检索所需信息。近期,众多现有方法借助大型语言模型(LLMs)来解决这一问题,充分利用其在理解用户查询及生成相应SQL代码方面的强大能力。然而,LLMs中的参数化知识可能不足以覆盖所有多样且领域特定的查询,这些查询往往需要基于多种数据库模式进行落地,这导致生成的SQL语句时常不够准确。为解决此问题,我们提出构建一个作为知识基础源的文本到SQL知识库,从中检索并生成针对给定查询所需的知识。特别地,与现有方法要么手动标注知识、要么仅为每个查询生成少量知识不同,我们的知识库全面而综合,其构建基于所有可用问题及其关联数据库模式与相关知识点的结合,并可复用于来自不同数据集和领域的未见数据库。我们在多个文本到SQL数据集上验证了我们的方法,同时考虑了数据库重叠与非重叠的场景,结果表明,该方法显著超越了相关基线。
English
Text-to-SQL aims to translate natural language queries into SQL statements, which is practical as it enables anyone to easily retrieve the desired information from databases. Recently, many existing approaches tackle this problem with Large Language Models (LLMs), leveraging their strong capability in understanding user queries and generating corresponding SQL code. Yet, the parametric knowledge in LLMs might be limited to covering all the diverse and domain-specific queries that require grounding in various database schemas, which makes generated SQLs less accurate oftentimes. To tackle this, we propose constructing the knowledge base for text-to-SQL, a foundational source of knowledge, from which we retrieve and generate the necessary knowledge for given queries. In particular, unlike existing approaches that either manually annotate knowledge or generate only a few pieces of knowledge for each query, our knowledge base is comprehensive, which is constructed based on a combination of all the available questions and their associated database schemas along with their relevant knowledge, and can be reused for unseen databases from different datasets and domains. We validate our approach on multiple text-to-SQL datasets, considering both the overlapping and non-overlapping database scenarios, where it outperforms relevant baselines substantially.

Summary

AI-Generated Summary

PDF11May 29, 2025