ChartLens: チャートにおける細粒度の視覚的帰属

要旨

マルチモーダル大規模言語モデル（MLLMs）の能力が向上するにつれ、チャート理解などのタスクが進展しています。しかし、これらのモデルはしばしば幻覚（hallucination）に悩まされ、生成されたテキストシーケンスが提供された視覚データと矛盾する場合があります。この問題に対処するため、我々は「Post-Hoc Visual Attribution for Charts」を導入し、特定のチャート関連の応答を検証するための細粒度のチャート要素を特定します。我々は、セグメンテーションベースの技術を用いてチャートオブジェクトを識別し、MLLMsと共に細粒度の視覚的帰属を行うための「set-of-marks prompting」を採用する新たなチャート帰属アルゴリズム「ChartLens」を提案します。さらに、金融、政策、経済などの多様な分野から合成および実世界のチャートを収集し、細粒度の帰属アノテーションを特徴とするベンチマーク「ChartVA-Eval」を提示します。評価の結果、ChartLensは細粒度の帰属を26～66％改善することが示されました。

English

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.

ChartLens: チャートにおける細粒度の視覚的帰属

ChartLens: Fine-grained Visual Attribution in Charts

要旨

Support