arXiVeri: GPTを用いた自動表検証

要旨

科学文書における数値データの正確な転記がなければ、科学者は正確な結論を導き出すことができません。しかし、ある論文から別の論文へ数値データを転記する過程は、人的ミスが起こりやすいものです。本論文では、この課題に対処するため、引用元を参照して表内の数値データの正確性を検証することを目的とした自動表検証（AutoTV）という新たなタスクを提案します。このタスクを支援するため、arXivのオープンアクセス学術論文から抽出した表形式データを含む新しいベンチマーク「arXiVeri」を提案します。また、表検証器の性能を評価するための指標を導入し、以下の2つの主要な領域に焦点を当てます：(i) 引用文書内の対象表に対応する元表を特定することを目的とした「表マッチング」、および (ii) 対象表と元表の間で共有されるセルを特定し、その行と列のインデックスを正確に特定することを目的とした「セルマッチング」です。現代の大規模言語モデル（LLM）の柔軟な能力を活用し、表検証のためのシンプルなベースラインを提案します。我々の調査結果は、OpenAIのGPT-4のような最先端のLLMであっても、このタスクの複雑さを浮き彫りにしています。コードとベンチマークは公開される予定です。

English

Without accurate transcription of numerical data in scientific documents, a scientist cannot draw accurate conclusions. Unfortunately, the process of copying numerical data from one paper to another is prone to human error. In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources. To support this task, we propose a new benchmark, arXiVeri, which comprises tabular data drawn from open-access academic papers on arXiv. We introduce metrics to evaluate the performance of a table verifier in two key areas: (i) table matching, which aims to identify the source table in a cited document that corresponds to a target table, and (ii) cell matching, which aims to locate shared cells between a target and source table and identify their row and column indices accurately. By leveraging the flexible capabilities of modern large language models (LLMs), we propose simple baselines for table verification. Our findings highlight the complexity of this task, even for state-of-the-art LLMs like OpenAI's GPT-4. The code and benchmark will be made publicly available.

arXiVeri: GPTを用いた自動表検証

arXiVeri: Automatic table verification with GPT

要旨

Support