ChartM^3:一种多阶段代码驱动流水线,用于构建图表理解中的多维多步骤视觉推理数据
ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension
November 4, 2025
作者: Duo Xu, Hao Cheng, Xin Lin, Zhen Xie, Hao Wang
cs.AI
摘要
复杂图表理解任务要求多模态大语言模型具备高级视觉识别与推理能力。然而当前研究对现实应用中普遍存在的复杂图表场景及计算密集型推理任务的覆盖范围有限。本研究提出一种自动化多阶段代码驱动流程,通过系统化生成视觉推理数据集以解决这些局限性。该流程集成检索增强生成技术获取专业图表模板,并采用思维链策略生成模拟真实数据分布的推理代码,从而驱动图表渲染及问题相关的统计计算。基于模型的评估表明,该流程有效提升了图表多样性与数据质量。基于此框架,我们构建了ChartM³多维多步数据集——包含3.8万张图表和14.2万组问答对用于训练,同时提供2871个高质量评估样本以实现实用性能评估。监督微调与强化学习实验证明,我们的数据集显著提升了模型推理能力与跨领域泛化性能,使较小规模模型在复杂图表理解任务中达到与更大规模模型相媲美的表现。
English
Complex chart understanding tasks demand advanced visual recognition and
reasoning capabilities from multimodal large language models (MLLMs). However,
current research provides limited coverage of complex chart scenarios and
computation-intensive reasoning tasks prevalent in real-world applications.
This study proposes an automated multi-stage code-driven pipeline for
systematically generating visual reasoning datasets to address these
limitations. The pipeline integrates retrieval-augmented generation (RAG) to
retrieve professional chart templates and employs chain-of-thought (CoT)
strategies to generate reasoning codes that simulate real data distributions,
thereby driving chart rendering and question-related statistical computations.
Through model-based evaluation, the pipeline enhances chart diversity and data
quality. Using this framework, we construct ChartM^3, a multi-dimensional and
multi-step dataset containing 38K charts and 142K Q&A pairs for training, along
with 2,871 high-quality evaluation samples for enabling practical performance
assessment. Supervised fine-tuning (SFT) and reinforcement learning (RL)
experiments demonstrate that our dataset significantly improves reasoning
capabilities and cross-domain generalization performance, enabling smaller
models to achieve performance comparable to larger-scale models in complex
chart comprehension.