HiddenTables & PyQTax:一个合作游戏和数据集,用于表格问答(TableQA),以确保在众多分类法中实现规模和数据隐私。
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies
June 16, 2024
作者: William Watson, Nicole Cho, Tucker Balch, Manuela Veloso
cs.AI
摘要
大量不同的大型语言模型(LLMs)在上下文分析表格问答任务时面临着共同的挑战。这些挑战源自于(1)大表格的有限上下文窗口,(2)在单元格边界上的多方面标记化模式之间的差异,以及(3)在使用外部模型如gpt-3.5-turbo时由数据保密性带来的各种限制。我们提出了一个名为“HiddenTables”的合作游戏,作为解决这一挑战的潜在方案。实质上,“HiddenTables”是由生成代码的LLM“Solver”和评估LLM代理解决表格QA任务能力的“Oracle”之间进行的游戏。这个游戏基于自然语言模式,并且重要的是确保底层数据的安全性。我们在各种表格上进行了实验证据,展示了LLM在无法泛化和处理复杂查询、处理组合依赖性以及在提供具体表格模式时将自然语言与程序命令对齐方面的集体能力不足。与基于编码器的模型不同,我们已经推动了“HiddenTables”的边界,使其不受行数限制 - 因此我们在提示和完成标记方面表现出更高的效率。我们的基础设施产生了一个新数据集“PyQTax”,涵盖了116,671个问题-表格-答案三元组,并为不同问题分类提供了额外的细分和标签。因此,与我们关于LLM在TableQA任务中不足的学术贡献相辅相成,“HiddenTables”是LLM如何与大规模数据集进行交互,同时确保数据安全性并最小化生成成本的实际体现。
English
A myriad of different Large Language Models (LLMs) face a common challenge in
contextually analyzing table question-answering tasks. These challenges are
engendered from (1) finite context windows for large tables, (2) multi-faceted
discrepancies amongst tokenization patterns against cell boundaries, and (3)
various limitations stemming from data confidentiality in the process of using
external models such as gpt-3.5-turbo. We propose a cooperative game dubbed
"HiddenTables" as a potential resolution to this challenge. In essence,
"HiddenTables" is played between the code-generating LLM "Solver" and the
"Oracle" which evaluates the ability of the LLM agents to solve Table QA tasks.
This game is based on natural language schemas and importantly, ensures the
security of the underlying data. We provide evidential experiments on a diverse
set of tables that demonstrate an LLM's collective inability to generalize and
perform on complex queries, handle compositional dependencies, and align
natural language to programmatic commands when concrete table schemas are
provided. Unlike encoder-based models, we have pushed the boundaries of
"HiddenTables" to not be limited by the number of rows - therefore we exhibit
improved efficiency in prompt and completion tokens. Our infrastructure has
spawned a new dataset "PyQTax" that spans across 116,671 question-table-answer
triplets and provides additional fine-grained breakdowns & labels for varying
question taxonomies. Therefore, in tandem with our academic contributions
regarding LLMs' deficiency in TableQA tasks, "HiddenTables" is a tactile
manifestation of how LLMs can interact with massive datasets while ensuring
data security and minimizing generation costs.Summary
AI-Generated Summary