ChartWalker：跨图表RAG任务的基准测试

摘要

跨图表检索增强生成（RAG）在科学、商业和政治领域的复杂多模态分析任务中至关重要。然而，现有基准测试要么专注于结构清晰、文本化的表格，要么通过简单地提取关键点来生成跨图表问题，这往往会导致查询与证据之间存在词汇重叠，并产生逻辑不一致的推理链。为解决这一问题，我们提出了ChartWalker，一个用于构建具有挑战性的跨图表RAG任务的新型框架。ChartWalker采用了专为图表设计的层次化知识图谱构建方法，该方法按粒度组织实体和关系，以保留分析结构。随后，我们提出了一种结构感知的采样算法，该算法合成语义连贯的多跳推理路径，从而实现对问答生成中查询难度和粒度的显式控制。基于这一框架，我们发布了ChartWalker-Bench，这是一个涵盖多个领域和跨图表查询类型的综合性基准测试。在主要RAG范式上的广泛评估揭示了显著的性能差距，凸显了该基准测试的难度和实用性。此外，我们提供了ChartWalker-Agent，一个基于代理的基线方法，以促进分析并启发未来的系统设计。

English

Cross-Chart Retrieval-Augmented Generation (RAG) is critical for complex multi-modal analytical tasks in scientific, business, and political domains. However, existing benchmarks either focus on tables, which are well-structured and textualized, or generate cross-chart questions by simply extracting key points, which often induces lexical overlap between queries and evidence and yields logically inconsistent reasoning chains. To address this, we introduce ChartWalker, a novel framework for constructing challenging cross-chart RAG tasks. ChartWalker features a hierarchical knowledge graph construction method tailored to charts, which organizes entities and relations by granularity to preserve analytical structure. We then propose a structure-aware sampling algorithm that synthesizes semantically coherent, multi-hop reasoning paths, enabling explicit control over query difficulty and granularity for QA generation. Built with this framework, we release ChartWalker-Bench, a comprehensive benchmark spanning diverse domains and cross-chart query types. Extensive evaluations across major RAG paradigms reveal significant performance gaps, underscoring the benchmark's difficulty and utility. Furthermore, we provide ChartWalker-Agent, an agentic baseline to facilitate analysis and inspire future system design.