arXiVeri: GPT를 활용한 자동 테이블 검증

초록

과학 문서 내 수치 데이터의 정확한 전사 없이는 과학자가 정확한 결론을 도출할 수 없습니다. 불행히도, 한 논문에서 다른 논문으로 수치 데이터를 복사하는 과정은 인간의 실수에 취약합니다. 본 논문에서는 이 문제를 해결하기 위해 새로운 과제인 자동 테이블 검증(AutoTV)을 제안합니다. 이 과제의 목표는 인용된 출처를 교차 참조하여 테이블 내 수치 데이터의 정확성을 검증하는 것입니다. 이 과제를 지원하기 위해, arXiv의 오픈 액세스 학술 논문에서 추출한 표 형식의 데이터로 구성된 새로운 벤치마크인 arXiVeri를 제안합니다. 우리는 테이블 검증기의 성능을 평가하기 위해 두 가지 주요 영역에서 지표를 소개합니다: (i) 인용 문서 내에서 대상 테이블에 해당하는 소스 테이블을 식별하는 테이블 매칭, 그리고 (ii) 대상 테이블과 소스 테이블 간의 공유 셀을 찾고 해당 셀의 행과 열 인덱스를 정확히 식별하는 셀 매칭. 현대의 대규모 언어 모델(LLM)의 유연한 능력을 활용하여, 우리는 테이블 검증을 위한 간단한 베이스라인을 제안합니다. 우리의 연구 결과는 OpenAI의 GPT-4와 같은 최첨단 LLM조차도 이 과제의 복잡성을 강조합니다. 코드와 벤치마크는 공개될 예정입니다.

English

Without accurate transcription of numerical data in scientific documents, a scientist cannot draw accurate conclusions. Unfortunately, the process of copying numerical data from one paper to another is prone to human error. In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources. To support this task, we propose a new benchmark, arXiVeri, which comprises tabular data drawn from open-access academic papers on arXiv. We introduce metrics to evaluate the performance of a table verifier in two key areas: (i) table matching, which aims to identify the source table in a cited document that corresponds to a target table, and (ii) cell matching, which aims to locate shared cells between a target and source table and identify their row and column indices accurately. By leveraging the flexible capabilities of modern large language models (LLMs), we propose simple baselines for table verification. Our findings highlight the complexity of this task, even for state-of-the-art LLMs like OpenAI's GPT-4. The code and benchmark will be made publicly available.

arXiVeri: GPT를 활용한 자동 테이블 검증

arXiVeri: Automatic table verification with GPT

초록

Support