ChartAB:图表定位与密集对齐基准测试系统
ChartAB: A Benchmark for Chart Grounding & Dense Alignment
October 30, 2025
作者: Aniruddh Bansal, Davit Soselia, Dang Nguyen, Tianyi Zhou
cs.AI
摘要
图表在可视化呈现、逻辑推理、数据分析和人类思想交流中具有重要作用。然而,现有视觉语言模型在图表细节感知和细粒度结构提取方面仍存在不足。这种图表基础理解能力的局限也阻碍了模型进行多图表对比与推理的能力。本文提出新型"ChartAlign基准(ChartAB)",通过涵盖多种类型和复杂度的图表,对视觉语言模型的表格数据提取、可视化元素定位和多元属性识别等基础任务进行系统性评估。我们设计了JSON模板以适配各项基础任务的定制化评估指标计算。通过引入两阶段推理机制,该基准还能进一步评估模型在跨图表元素/属性对齐与对比方面的能力。基于对多个前沿视觉语言模型的评估分析,我们揭示了其在图表理解任务中存在的感知偏差、薄弱环节、鲁棒性问题和幻觉现象等新发现。这些发现不仅凸显了不同模型在图表理解任务中的细粒度差异,更为当前模型需要加强的具体能力指明了方向。
English
Charts play an important role in visualization, reasoning, data analysis, and
the exchange of ideas among humans. However, existing vision-language models
(VLMs) still lack accurate perception of details and struggle to extract
fine-grained structures from charts. Such limitations in chart grounding also
hinder their ability to compare multiple charts and reason over them. In this
paper, we introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a
comprehensive evaluation of VLMs in chart grounding tasks, i.e., extracting
tabular data, localizing visualization elements, and recognizing various
attributes from charts of diverse types and complexities. We design a JSON
template to facilitate the calculation of evaluation metrics specifically
tailored for each grounding task. By incorporating a novel two-stage inference
workflow, the benchmark can further evaluate VLMs' capability to align and
compare elements/attributes across two charts. Our analysis of evaluations on
several recent VLMs reveals new insights into their perception biases,
weaknesses, robustness, and hallucinations in chart understanding. These
findings highlight the fine-grained discrepancies among VLMs in chart
understanding tasks and point to specific skills that need to be strengthened
in current models.