Text2SQL 不足以统一人工智能和数据库:使用 TAG 实现统一。
Text2SQL is Not Enough: Unifying AI and Databases with TAG
August 27, 2024
作者: Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia
cs.AI
摘要
为自然语言问题提供服务的人工智能系统承诺释放巨大价值。这样的系统将允许用户利用语言模型(LMs)强大的推理和知识能力,以及数据管理系统的可扩展计算能力。这些结合的能力将赋予用户在自定义数据源上提出任意自然语言问题的能力。然而,现有的方法和基准不足以探索这一情境。Text2SQL方法仅关注可以用关系代数表达的自然语言问题,代表了真实用户希望提出的问题的一小部分。同样,检索增强生成(RAG)考虑了只能通过对数据库中的一个或几个数据记录进行点查找来回答的查询的有限子集。我们提出了表增强生成(TAG),这是一个统一且通用的范式,用于回答关于数据库的自然语言问题。TAG模型代表了LM和数据库之间的各种互动方式,这些方式以前尚未被探索,并为利用LM在数据上的世界知识和推理能力创造了令人兴奋的研究机会。我们系统地开发了基准来研究TAG问题,并发现标准方法最多只能正确回答20%的查询,证实了在这一领域需要进一步研究。我们在https://github.com/TAG-Research/TAG-Bench 上发布了基准的代码。
English
AI systems that serve natural language questions over databases promise to
unlock tremendous value. Such systems would allow users to leverage the
powerful reasoning and knowledge capabilities of language models (LMs)
alongside the scalable computational power of data management systems. These
combined capabilities would empower users to ask arbitrary natural language
questions over custom data sources. However, existing methods and benchmarks
insufficiently explore this setting. Text2SQL methods focus solely on natural
language questions that can be expressed in relational algebra, representing a
small subset of the questions real users wish to ask. Likewise,
Retrieval-Augmented Generation (RAG) considers the limited subset of queries
that can be answered with point lookups to one or a few data records within the
database. We propose Table-Augmented Generation (TAG), a unified and
general-purpose paradigm for answering natural language questions over
databases. The TAG model represents a wide range of interactions between the LM
and database that have been previously unexplored and creates exciting research
opportunities for leveraging the world knowledge and reasoning capabilities of
LMs over data. We systematically develop benchmarks to study the TAG problem
and find that standard methods answer no more than 20% of queries correctly,
confirming the need for further research in this area. We release code for the
benchmark at https://github.com/TAG-Research/TAG-Bench.Summary
AI-Generated Summary