HiddenTables & PyQTax: 多様な分類体系におけるスケーラビリティとデータプライバシーを保証するためのTableQA向け協調ゲームとデータセット

要旨

多種多様な大規模言語モデル（LLMs）は、表形式の質問応答タスクを文脈的に分析する際に共通の課題に直面しています。これらの課題は、(1) 大規模な表に対する有限のコンテキストウィンドウ、(2) セル境界に対するトークン化パターンの多面的な不一致、(3) gpt-3.5-turboなどの外部モデルを使用する過程でのデータ機密性に起因する様々な制約から生じています。我々は、この課題に対する潜在的な解決策として、「HiddenTables」と呼ばれる協力ゲームを提案します。本質的に、「HiddenTables」は、コード生成LLM「Solver」と、LLMエージェントが表形式の質問応答タスクを解決する能力を評価する「Oracle」の間でプレイされます。このゲームは自然言語スキーマに基づいており、重要なことに、基盤となるデータの安全性を確保します。我々は、多様な表セットに対する実証実験を提供し、具体的な表スキーマが提供された場合に、LLMが複雑なクエリを一般化して実行すること、合成的依存関係を処理すること、自然言語をプログラムコマンドに整合させることにおいて集団的に無能であることを示します。エンコーダベースのモデルとは異なり、我々は「HiddenTables」の境界を行数に制限されないように押し広げたため、プロンプトトークンと完了トークンの効率が向上しています。我々のインフラストラクチャは、116,671の質問-表-回答トリプレットにまたがり、様々な質問分類に対する追加の細分化とラベルを提供する新しいデータセット「PyQTax」を生み出しました。したがって、LLMの表形式質問応答タスクにおける欠陥に関する学術的貢献と並行して、「HiddenTables」は、LLMがデータセキュリティを確保し、生成コストを最小化しながら大規模なデータセットとどのように相互作用できるかを具体的に示すものです。

English

A myriad of different Large Language Models (LLMs) face a common challenge in contextually analyzing table question-answering tasks. These challenges are engendered from (1) finite context windows for large tables, (2) multi-faceted discrepancies amongst tokenization patterns against cell boundaries, and (3) various limitations stemming from data confidentiality in the process of using external models such as gpt-3.5-turbo. We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge. In essence, "HiddenTables" is played between the code-generating LLM "Solver" and the "Oracle" which evaluates the ability of the LLM agents to solve Table QA tasks. This game is based on natural language schemas and importantly, ensures the security of the underlying data. We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries, handle compositional dependencies, and align natural language to programmatic commands when concrete table schemas are provided. Unlike encoder-based models, we have pushed the boundaries of "HiddenTables" to not be limited by the number of rows - therefore we exhibit improved efficiency in prompt and completion tokens. Our infrastructure has spawned a new dataset "PyQTax" that spans across 116,671 question-table-answer triplets and provides additional fine-grained breakdowns & labels for varying question taxonomies. Therefore, in tandem with our academic contributions regarding LLMs' deficiency in TableQA tasks, "HiddenTables" is a tactile manifestation of how LLMs can interact with massive datasets while ensuring data security and minimizing generation costs.

HiddenTables & PyQTax: 多様な分類体系におけるスケーラビリティとデータプライバシーを保証するためのTableQA向け協調ゲームとデータセット

HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

要旨

Support