知識拡張型テキスト-to-SQLのための知識ベース構築

要旨

Text-to-SQLは、自然言語クエリをSQL文に変換することを目的としており、誰でも簡単にデータベースから必要な情報を取得できるようにする実用的な技術である。近年、多くの既存のアプローチが大規模言語モデル（LLMs）を活用してこの問題に取り組んでおり、ユーザークエリの理解と対応するSQLコードの生成における強力な能力を利用している。しかし、LLMsのパラメトリックな知識は、多様でドメイン固有のクエリをカバーするには限界があり、特にさまざまなデータベーススキーマに基づくクエリに対しては、生成されるSQLの精度が低くなる場合がある。この問題に対処するため、我々はText-to-SQLのための知識ベースを構築し、与えられたクエリに対して必要な知識を取得・生成する基盤となる知識源を提案する。特に、既存のアプローチが手動で知識を注釈付けするか、各クエリに対してわずかな知識しか生成しないのに対し、我々の知識ベースは包括的であり、利用可能なすべての質問とそれに関連するデータベーススキーマ、および関連知識を組み合わせて構築され、異なるデータセットやドメインからの未見のデータベースにも再利用可能である。我々は、複数のText-to-SQLデータセットにおいて、データベースが重複する場合と重複しない場合の両方を考慮してアプローチを検証し、関連するベースラインを大幅に上回る結果を示した。

English

Text-to-SQL aims to translate natural language queries into SQL statements, which is practical as it enables anyone to easily retrieve the desired information from databases. Recently, many existing approaches tackle this problem with Large Language Models (LLMs), leveraging their strong capability in understanding user queries and generating corresponding SQL code. Yet, the parametric knowledge in LLMs might be limited to covering all the diverse and domain-specific queries that require grounding in various database schemas, which makes generated SQLs less accurate oftentimes. To tackle this, we propose constructing the knowledge base for text-to-SQL, a foundational source of knowledge, from which we retrieve and generate the necessary knowledge for given queries. In particular, unlike existing approaches that either manually annotate knowledge or generate only a few pieces of knowledge for each query, our knowledge base is comprehensive, which is constructed based on a combination of all the available questions and their associated database schemas along with their relevant knowledge, and can be reused for unseen databases from different datasets and domains. We validate our approach on multiple text-to-SQL datasets, considering both the overlapping and non-overlapping database scenarios, where it outperforms relevant baselines substantially.

知識拡張型テキスト-to-SQLのための知識ベース構築

Knowledge Base Construction for Knowledge-Augmented Text-to-SQL

要旨

Support