ChatPaper.aiChatPaper

ChartAB:图表定位与密集对齐基准测试系统

ChartAB: A Benchmark for Chart Grounding & Dense Alignment

October 30, 2025
作者: Aniruddh Bansal, Davit Soselia, Dang Nguyen, Tianyi Zhou
cs.AI

摘要

图表在可视化呈现、逻辑推理、数据分析和人类思想交流中发挥着重要作用。然而,现有视觉语言模型在图表细节感知和细粒度结构提取方面仍存在不足。这种图表基础认知能力的局限也阻碍了模型进行多图表对比和推理的能力。本文提出新型"ChartAlign基准测试框架",通过涵盖多种类型和复杂度的图表,全面评估视觉语言模型在表格数据提取、可视化元素定位及图表属性识别等基础任务中的表现。我们设计了专用JSON模板以适配各项基础任务的定制化评估指标计算。通过引入创新的两阶段推理流程,该基准框架能进一步评估视觉语言模型在跨图表元素/属性对齐与对比方面的能力。基于对多个前沿视觉语言模型的评估分析,我们揭示了其在图表理解过程中存在的感知偏差、薄弱环节、鲁棒性不足和幻觉现象等新发现。这些发现不仅凸显了不同视觉语言模型在图表理解任务中的细粒度差异,更为当前模型需要强化的具体能力指明了方向。
English
Charts play an important role in visualization, reasoning, data analysis, and the exchange of ideas among humans. However, existing vision-language models (VLMs) still lack accurate perception of details and struggle to extract fine-grained structures from charts. Such limitations in chart grounding also hinder their ability to compare multiple charts and reason over them. In this paper, we introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of VLMs in chart grounding tasks, i.e., extracting tabular data, localizing visualization elements, and recognizing various attributes from charts of diverse types and complexities. We design a JSON template to facilitate the calculation of evaluation metrics specifically tailored for each grounding task. By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs' capability to align and compare elements/attributes across two charts. Our analysis of evaluations on several recent VLMs reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding. These findings highlight the fine-grained discrepancies among VLMs in chart understanding tasks and point to specific skills that need to be strengthened in current models.
PDF01December 2, 2025