arXiVeri：使用GPT进行自动表格验证

摘要

在科学文件中，如果数字数据没有准确转录，科学家就无法得出准确的结论。不幸的是，将数字数据从一份文件复制到另一份文件的过程容易出现人为错误。在本文中，我们提出通过自动表验证（AutoTV）这一新颖任务来解决这一挑战，其目标是通过交叉引用引用来源来验证表格中数字数据的准确性。为支持这一任务，我们提出了一个新的基准，arXiVeri，其中包括从arXiv开放获取的学术论文中提取的表格数据。我们引入了评估表验证器性能的指标，重点关注两个关键领域：（i）表匹配，旨在识别引用文档中对应于目标表的源表，以及（ii）单元匹配，旨在准确定位目标表和源表之间的共享单元，并识别其行和列索引。通过利用现代大型语言模型（LLMs）的灵活能力，我们提出了表验证的简单基准。我们的研究结果突显了这一任务的复杂性，即使对于像OpenAI的GPT-4这样的最先进的LLMs也是如此。代码和基准将公开提供。

English

Without accurate transcription of numerical data in scientific documents, a scientist cannot draw accurate conclusions. Unfortunately, the process of copying numerical data from one paper to another is prone to human error. In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources. To support this task, we propose a new benchmark, arXiVeri, which comprises tabular data drawn from open-access academic papers on arXiv. We introduce metrics to evaluate the performance of a table verifier in two key areas: (i) table matching, which aims to identify the source table in a cited document that corresponds to a target table, and (ii) cell matching, which aims to locate shared cells between a target and source table and identify their row and column indices accurately. By leveraging the flexible capabilities of modern large language models (LLMs), we propose simple baselines for table verification. Our findings highlight the complexity of this task, even for state-of-the-art LLMs like OpenAI's GPT-4. The code and benchmark will be made publicly available.

arXiVeri：使用GPT进行自动表格验证

arXiVeri: Automatic table verification with GPT

摘要

Support