ChartLens：图表中的细粒度视觉归因

摘要

随着多模态大语言模型（MLLMs）能力的不断提升，其在图表理解等任务上取得了显著进展。然而，这些模型常面临“幻觉”问题，即生成的文本序列与提供的视觉数据相矛盾。为解决此问题，我们引入了“事后视觉归因于图表”的方法，该方法能够识别出验证特定图表相关响应的细粒度图表元素。我们提出了ChartLens，一种新颖的图表归因算法，该算法利用基于分割的技术识别图表对象，并通过MLLMs采用标记集提示实现细粒度视觉归因。此外，我们推出了ChartVA-Eval基准测试，该基准包含来自金融、政策和经济等多个领域的合成及真实世界图表，并配有细粒度归因标注。评估结果显示，ChartLens在细粒度归因方面提升了26%至66%。

English

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.

ChartLens：图表中的细粒度视觉归因

ChartLens: Fine-grained Visual Attribution in Charts

摘要

Support