ChartLens：圖表中的細粒度視覺歸因

摘要

多模態大型語言模型（MLLMs）日益增強的處理能力，已推動了諸如圖表理解等任務的進展。然而，這些模型常遭遇幻覺問題，即生成的文本序列與提供的視覺數據相矛盾。為解決此問題，我們引入了「圖表事後視覺歸因」方法，該方法能識別細粒度的圖表元素，以驗證與圖表相關的特定回應。我們提出了ChartLens，這是一種新穎的圖表歸因算法，它利用基於分割的技術來識別圖表對象，並結合MLLMs的標記集提示進行細粒度視覺歸因。此外，我們還推出了ChartVA-Eval，這是一個包含來自金融、政策、經濟等多領域的合成與真實圖表的基準測試集，具備細粒度歸因註釋。我們的評估結果顯示，ChartLens將細粒度歸因的準確率提升了26-66%。

English

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.

ChartLens：圖表中的細粒度視覺歸因

ChartLens: Fine-grained Visual Attribution in Charts

摘要

Support