HiddenTables & PyQTax:一個合作遊戲和數據集,用於確保跨多個分類系統的規模和數據隱私的TableQA。
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies
June 16, 2024
作者: William Watson, Nicole Cho, Tucker Balch, Manuela Veloso
cs.AI
摘要
眾多不同的大型語言模型(LLM)在上下文分析表格問答任務時面臨共同挑戰。這些挑戰源於(1)大型表格的有限上下文窗口、(2)在單元格邊界上的多面向標記化模式差異,以及(3)在使用外部模型如gpt-3.5-turbo時,由於數據機密性而產生的各種限制。我們提出了一種名為“HiddenTables”的合作遊戲,作為應對這一挑戰的潛在解決方案。本質上,“HiddenTables”是由生成代碼的LLM“Solver”和評估LLM代理解決表格QA任務能力的“Oracle”之間進行的遊戲。這個遊戲基於自然語言模式,並且重要的是確保底層數據的安全性。我們通過對各種表格進行的實驗提供了證據,表明LLM在無法推廣和處理複雜查詢、處理組合依賴性以及在提供具體表格模式時將自然語言對齊到程序命令方面的集體無能。與基於編碼器的模型不同,我們將“HiddenTables”的界限推到不受行數限制,因此我們展示了在提示和完成標記方面的效率提升。我們的基礎設施產生了一個新的數據集“PyQTax”,涵蓋了116,671個問題-表格-答案三元組,並為不同問題分類提供了額外的細分和標籤。因此,與我們關於LLM在TableQA任務中不足的學術貢獻相輔相成,“HiddenTables”是LLM如何與大規模數據集互動,同時確保數據安全性並最小化生成成本的具體體現。
English
A myriad of different Large Language Models (LLMs) face a common challenge in
contextually analyzing table question-answering tasks. These challenges are
engendered from (1) finite context windows for large tables, (2) multi-faceted
discrepancies amongst tokenization patterns against cell boundaries, and (3)
various limitations stemming from data confidentiality in the process of using
external models such as gpt-3.5-turbo. We propose a cooperative game dubbed
"HiddenTables" as a potential resolution to this challenge. In essence,
"HiddenTables" is played between the code-generating LLM "Solver" and the
"Oracle" which evaluates the ability of the LLM agents to solve Table QA tasks.
This game is based on natural language schemas and importantly, ensures the
security of the underlying data. We provide evidential experiments on a diverse
set of tables that demonstrate an LLM's collective inability to generalize and
perform on complex queries, handle compositional dependencies, and align
natural language to programmatic commands when concrete table schemas are
provided. Unlike encoder-based models, we have pushed the boundaries of
"HiddenTables" to not be limited by the number of rows - therefore we exhibit
improved efficiency in prompt and completion tokens. Our infrastructure has
spawned a new dataset "PyQTax" that spans across 116,671 question-table-answer
triplets and provides additional fine-grained breakdowns & labels for varying
question taxonomies. Therefore, in tandem with our academic contributions
regarding LLMs' deficiency in TableQA tasks, "HiddenTables" is a tactile
manifestation of how LLMs can interact with massive datasets while ensuring
data security and minimizing generation costs.Summary
AI-Generated Summary