ChatPaper.aiChatPaper

ChartVerse:通过可靠程序化合成实现图表推理的规模化扩展

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

January 20, 2026
作者: Zheng Liu, Honglin Lin, Chonghan Qin, Xiaoyang Wang, Xin Gao, Yu Li, Mengzhang Cai, Yun Zhu, Zhanping Zhong, Qizhi Pei, Zhuoshi Pan, Xiaoran Shang, Bin Cui, Conghui He, Wentao Zhang, Lijun Wu
cs.AI

摘要

图表推理是视觉语言模型(VLMs)的核心能力。然而,开源模型的发展正受到高质量训练数据匮乏的严重制约。现有数据集面临双重挑战:合成图表往往过于简单且重复,而关联的问答对容易出现幻觉现象,缺乏复杂任务所需的推理深度。为弥补这一空白,我们提出ChartVerse——一个可扩展的框架,旨在从零开始合成复杂图表及可靠推理数据。(1) 针对简单模式瓶颈,我们首次提出展开后验熵(RPE)这一量化图表复杂度的新指标。在RPE引导下,开发复杂度感知图表编码器,通过可执行程序自主合成多样化高复杂度图表。(2) 为确保推理严谨性,我们开发真值锚定逆向问答合成方案。区别于标准生成流程,采用答案优先范式:直接从源代码提取确定性答案,基于这些锚点生成问题,并执行严格的一致性验证。为提升难度与推理深度,我们根据模型失败率筛选样本,并提炼高质量思维链推理数据。以Qwen3-VL-30B-A3B-Thinking作为教师模型,我们最终构建了包含60万条数据的ChartVerse-SFT数据集和4万条数据的ChartVerse-RL数据集。实验表明,ChartVerse-8B实现了最先进的性能,显著超越其教师模型,并与更强的Qwen3-VL-32B-Thinking模型相媲美。
English
Chart reasoning is a critical capability for Vision Language Models (VLMs). However, the development of open-source models is severely hindered by the lack of high-quality training data. Existing datasets suffer from a dual challenge: synthetic charts are often simplistic and repetitive, while the associated QA pairs are prone to hallucinations and lack the reasoning depth required for complex tasks. To bridge this gap, we propose ChartVerse, a scalable framework designed to synthesize complex charts and reliable reasoning data from scratch. (1) To address the bottleneck of simple patterns, we first introduce Rollout Posterior Entropy (RPE), a novel metric that quantifies chart complexity. Guided by RPE, we develop complexity-aware chart coder to autonomously synthesize diverse, high-complexity charts via executable programs. (2) To guarantee reasoning rigor, we develop truth-anchored inverse QA synthesis. Diverging from standard generation, we adopt an answer-first paradigm: we extract deterministic answers directly from the source code, generate questions conditional on these anchors, and enforce strict consistency verification. To further elevate difficulty and reasoning depth, we filter samples based on model fail-rate and distill high-quality Chain-of-Thought (CoT) reasoning. We curate ChartVerse-SFT-600K and ChartVerse-RL-40K using Qwen3-VL-30B-A3B-Thinking as the teacher. Experimental results demonstrate that ChartVerse-8B achieves state-of-the-art performance, notably surpassing its teacher and rivaling the stronger Qwen3-VL-32B-Thinking.
PDF62January 27, 2026