Text2SQL 不足以統一人工智慧與資料庫:以 TAG 為例
Text2SQL is Not Enough: Unifying AI and Databases with TAG
August 27, 2024
作者: Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia
cs.AI
摘要
AI系統可用於在資料庫上回答自然語言問題,有望帶來巨大價值。這樣的系統將允許用戶利用語言模型(LMs)強大的推理和知識能力,並結合數據管理系統的可擴展計算能力。這些結合的能力將賦予用戶在自定義數據來源上提出任意自然語言問題的能力。然而,現有方法和基準不足以探索這種情況。Text2SQL方法僅專注於可以用關係代數表達的自然語言問題,這代表了真實用戶希望提出的問題的一小部分。同樣地,檢索增強生成(RAG)考慮了只能通過對數據庫中的一個或幾個數據記錄進行點查找來回答的有限子集查詢。我們提出了表格增強生成(TAG),這是一種統一且通用的範式,用於回答資料庫上的自然語言問題。TAG模型代表了LM和數據庫之間的各種互動,這些互動以前尚未被探索,並為利用LM在數據上的世界知識和推理能力創造了令人興奮的研究機會。我們系統地開發了基準來研究TAG問題,發現標準方法僅能正確回答不超過20%的查詢,這證實了在這個領域需要進一步研究。我們在https://github.com/TAG-Research/TAG-Bench 上釋出了基準的代碼。
English
AI systems that serve natural language questions over databases promise to
unlock tremendous value. Such systems would allow users to leverage the
powerful reasoning and knowledge capabilities of language models (LMs)
alongside the scalable computational power of data management systems. These
combined capabilities would empower users to ask arbitrary natural language
questions over custom data sources. However, existing methods and benchmarks
insufficiently explore this setting. Text2SQL methods focus solely on natural
language questions that can be expressed in relational algebra, representing a
small subset of the questions real users wish to ask. Likewise,
Retrieval-Augmented Generation (RAG) considers the limited subset of queries
that can be answered with point lookups to one or a few data records within the
database. We propose Table-Augmented Generation (TAG), a unified and
general-purpose paradigm for answering natural language questions over
databases. The TAG model represents a wide range of interactions between the LM
and database that have been previously unexplored and creates exciting research
opportunities for leveraging the world knowledge and reasoning capabilities of
LMs over data. We systematically develop benchmarks to study the TAG problem
and find that standard methods answer no more than 20% of queries correctly,
confirming the need for further research in this area. We release code for the
benchmark at https://github.com/TAG-Research/TAG-Bench.Summary
AI-Generated Summary