ChartWalker: 교차 차트 RAG 태스크의 벤치마킹

초록

크로스 차트 검색 증강 생성(RAG)은 과학, 비즈니스, 정치 등 다양한 분야의 복잡한 다중 모드 분석 작업에 필수적이다. 그러나 기존 벤치마크는 구조화되고 텍스트화된 표에 초점을 맞추거나, 단순히 핵심 사항을 추출하여 크로스 차트 질문을 생성하는 경향이 있는데, 이는 종종 질의와 증거 간의 어휘적 중복을 유발하고 논리적으로 일관되지 않은 추론 체인을 초래한다. 이를 해결하기 위해, 우리는 도전적인 크로스 차트 RAG 작업을 구성하기 위한 새로운 프레임워크인 ChartWalker를 소개한다. ChartWalker는 차트에 맞춤화된 계층적 지식 그래프 구축 방법을 특징으로 하며, 이는 분석 구조를 보존하기 위해 개체와 관계를 세분성별로 구성한다. 또한 우리는 의미적으로 일관된 다중 홉 추론 경로를 합성하는 구조 인식 샘플링 알고리즘을 제안하여, 질의응답 생성을 위한 질의 난이도와 세분성을 명시적으로 제어할 수 있게 한다. 이 프레임워크로 구축된 ChartWalker-Bench를 공개하는데, 이는 다양한 도메인과 크로스 차트 질의 유형을 포괄하는 종합적인 벤치마크이다. 주요 RAG 패러다임에 걸친 광범위한 평가는 상당한 성능 차이를 드러내며, 이 벤치마크의 난이도와 유용성을 강조한다. 또한, 분석을 촉진하고 향후 시스템 설계에 영감을 주기 위해 에이전트 기반 베이스라인인 ChartWalker-Agent를 제공한다.

English

Cross-Chart Retrieval-Augmented Generation (RAG) is critical for complex multi-modal analytical tasks in scientific, business, and political domains. However, existing benchmarks either focus on tables, which are well-structured and textualized, or generate cross-chart questions by simply extracting key points, which often induces lexical overlap between queries and evidence and yields logically inconsistent reasoning chains. To address this, we introduce ChartWalker, a novel framework for constructing challenging cross-chart RAG tasks. ChartWalker features a hierarchical knowledge graph construction method tailored to charts, which organizes entities and relations by granularity to preserve analytical structure. We then propose a structure-aware sampling algorithm that synthesizes semantically coherent, multi-hop reasoning paths, enabling explicit control over query difficulty and granularity for QA generation. Built with this framework, we release ChartWalker-Bench, a comprehensive benchmark spanning diverse domains and cross-chart query types. Extensive evaluations across major RAG paradigms reveal significant performance gaps, underscoring the benchmark's difficulty and utility. Furthermore, we provide ChartWalker-Agent, an agentic baseline to facilitate analysis and inspire future system design.