ChatPaper.aiChatPaper

ChartNet:一个面向稳健图表理解的百万级高质量多模态数据集

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

March 28, 2026
作者: Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Ben Wiesel, Shafiq Abedin, Amit Alfassy, Eli Schwartz, Daniel Caraballo, Yagmur Gizem Cinar, Florian Scheidegger, Steven I. Ross, Daniel Karl I. Weidele, Hang Hua, Ekaterina Arutyunova, Roei Herzig, Zexue He, Zihan Wang, Xinyue Yu, Yunfei Zhao, Sicong Jiang, Minghao Liu, Qunshu Lin, Peter Staar, Luis Lastras, Aude Oliva, Rogerio Feris
cs.AI

摘要

理解图表需要模型能够对几何视觉模式、结构化数值数据和自然语言进行联合推理——这种能力正是当前视觉语言模型的局限所在。我们推出ChartNet,一个高质量、百万规模的多模态数据集,旨在推动图表解读与推理技术的发展。该数据集通过创新的代码引导合成流程,生成了涵盖24种图表类型和6种绘图库的150万个多样化图表样本。每个样本包含五个对齐组件:绘图代码、渲染图表图像、数据表格、自然语言摘要及带推理过程的问答,实现了细粒度的跨模态对齐。为全面覆盖图表理解维度,ChartNet还特别收录了人工标注数据、真实场景数据、安全性及可追溯性等专项子集。通过严格的质控流程,确保了图表表征的视觉保真度、语义准确性和多样性。基于ChartNet的微调在多个基准测试中均取得稳定提升,证明了其作为多模态模型大规模监督数据的实用价值。作为同类规模最大的开源数据集,ChartNet旨在为开发具有鲁棒性和泛化能力的数据可视化理解基础模型提供支持。数据集已发布于https://huggingface.co/datasets/ibm-granite/ChartNet。
English
Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs) remain limited. We introduce ChartNet, a high-quality, million-scale multimodal dataset designed to advance chart interpretation and reasoning. ChartNet leverages a novel code-guided synthesis pipeline to generate 1.5 million diverse chart samples spanning 24 chart types and 6 plotting libraries. Each sample consists of five aligned components: plotting code, rendered chart image, data table, natural language summary, and question-answering with reasoning, providing fine-grained cross-modal alignment. To capture the full spectrum of chart comprehension, ChartNet additionally includes specialized subsets encompassing human annotated data, real-world data, safety, and grounding. Moreover, a rigorous quality-filtering pipeline ensures visual fidelity, semantic accuracy, and diversity across chart representations. Fine-tuning on ChartNet consistently improves results across benchmarks, demonstrating its utility as large-scale supervision for multimodal models. As the largest open-source dataset of its kind, ChartNet aims to support the development of foundation models with robust and generalizable capabilities for data visualization understanding. The dataset is publicly available at https://huggingface.co/datasets/ibm-granite/ChartNet
PDF111April 1, 2026