HiddenTables & PyQTax: 다양한 분류 체계 간의 규모와 데이터 프라이버시를 보장하기 위한 TableQA 협력 게임 및 데이터셋

초록

다양한 대형 언어 모델(LLMs)은 테이블 질의응답 작업을 상황에 맞게 분석하는 데 공통적인 도전에 직면하고 있습니다. 이러한 도전은 (1) 대형 테이블에 대한 제한된 컨텍스트 윈도우, (2) 셀 경계와 토큰화 패턴 간의 다면적 불일치, 그리고 (3) gpt-3.5-turbo와 같은 외부 모델 사용 과정에서 발생하는 데이터 기밀성 관련 다양한 제약에서 비롯됩니다. 우리는 이러한 도전에 대한 잠재적 해결책으로 "HiddenTables"라는 협력 게임을 제안합니다. 본질적으로, "HiddenTables"는 코드 생성 LLM인 "Solver"와 LLM 에이전트가 테이블 질의응답 작업을 해결하는 능력을 평가하는 "Oracle" 간에 진행됩니다. 이 게임은 자연어 스키마를 기반으로 하며, 특히 기본 데이터의 보안을 보장합니다. 우리는 다양한 테이블 집합에 대한 실험을 통해 구체적인 테이블 스키마가 제공될 때 LLM이 복잡한 쿼리를 일반화하고 수행하는 능력, 구성적 의존성을 처리하는 능력, 그리고 자연어를 프로그래밍 명령어에 정렬하는 능력이 부족함을 입증합니다. 인코더 기반 모델과 달리, 우리는 "HiddenTables"의 경계를 행 수에 제한되지 않도록 확장하여 프롬프트 및 완성 토큰의 효율성을 개선했습니다. 우리의 인프라는 116,671개의 질문-테이블-답변 트리플렛으로 구성된 새로운 데이터셋 "PyQTax"를 생성했으며, 다양한 질문 분류에 대한 추가적인 세분화된 분류 및 레이블을 제공합니다. 따라서, 테이블 질의응답 작업에서 LLM의 결함에 대한 학문적 기여와 함께, "HiddenTables"는 데이터 보안을 보장하고 생성 비용을 최소화하면서 LLM이 대규모 데이터셋과 상호작용할 수 있는 구체적인 구현체입니다.

English

A myriad of different Large Language Models (LLMs) face a common challenge in contextually analyzing table question-answering tasks. These challenges are engendered from (1) finite context windows for large tables, (2) multi-faceted discrepancies amongst tokenization patterns against cell boundaries, and (3) various limitations stemming from data confidentiality in the process of using external models such as gpt-3.5-turbo. We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge. In essence, "HiddenTables" is played between the code-generating LLM "Solver" and the "Oracle" which evaluates the ability of the LLM agents to solve Table QA tasks. This game is based on natural language schemas and importantly, ensures the security of the underlying data. We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries, handle compositional dependencies, and align natural language to programmatic commands when concrete table schemas are provided. Unlike encoder-based models, we have pushed the boundaries of "HiddenTables" to not be limited by the number of rows - therefore we exhibit improved efficiency in prompt and completion tokens. Our infrastructure has spawned a new dataset "PyQTax" that spans across 116,671 question-table-answer triplets and provides additional fine-grained breakdowns & labels for varying question taxonomies. Therefore, in tandem with our academic contributions regarding LLMs' deficiency in TableQA tasks, "HiddenTables" is a tactile manifestation of how LLMs can interact with massive datasets while ensuring data security and minimizing generation costs.

HiddenTables & PyQTax: 다양한 분류 체계 간의 규모와 데이터 프라이버시를 보장하기 위한 TableQA 협력 게임 및 데이터셋

HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

초록

Support