ChartLens: 차트에서의 세밀한 시각적 속성 분석

초록

다중모달 대형 언어 모델(MLLMs)의 능력이 향상되면서 차트 이해와 같은 작업이 발전하고 있습니다. 그러나 이러한 모델들은 종종 시각적 데이터와 상충되는 텍스트 시퀀스를 생성하는 환각(hallucination) 문제를 겪습니다. 이를 해결하기 위해, 우리는 주어진 차트 관련 응답을 검증하는 세분화된 차트 요소를 식별하는 '차트를 위한 사후 시각적 귀속(Post-Hoc Visual Attribution for Charts)'을 소개합니다. 우리는 세분화 기반 기법을 사용하여 차트 객체를 식별하고, MLLMs와 함께 세트-오브-마크(set-of-marks) 프롬프팅을 활용하여 세분화된 시각적 귀속을 수행하는 새로운 차트 귀속 알고리즘인 ChartLens를 제안합니다. 또한, 금융, 정책, 경제 등 다양한 분야의 합성 및 실제 차트를 포함하고 세분화된 귀속 주석을 특징으로 하는 ChartVA-Eval 벤치마크를 제시합니다. 우리의 평가 결과, ChartLens는 세분화된 귀속을 26-66% 개선하는 것으로 나타났습니다.

English

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.

ChartLens: 차트에서의 세밀한 시각적 속성 분석

ChartLens: Fine-grained Visual Attribution in Charts

초록

Support