ChartM^3:基于代码驱动的多维多步骤图表理解视觉推理数据构建流水线
ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension
November 4, 2025
作者: Duo Xu, Hao Cheng, Xin Lin, Zhen Xie, Hao Wang
cs.AI
摘要
複雜圖表理解任務要求多模態大語言模型具備高級視覺識別與推理能力。然而現有研究對現實應用中普遍存在的複雜圖表場景及計算密集型推理任務覆蓋有限。本研究提出一種自動化多階段代碼驅動流程,通過系統化生成視覺推理數據集來解決這些局限性。該流程融合檢索增強生成技術以獲取專業圖表模板,並採用思維鏈策略生成模擬真實數據分佈的推理代碼,從而驅動圖表渲染及問題相關統計計算。通過基於模型的評估,該流程有效提升了圖表多樣性與數據質量。基於此框架,我們構建了ChartM³多維度多步驟數據集,包含3.8萬張圖表與14.2萬問答對用於訓練,以及2871個高質量評估樣本以實現實用性能評測。監督微調與強化學習實驗表明,我們的數據集顯著提升了模型推理能力與跨領域泛化性能,使較小模型在複雜圖表理解任務中能達到與大規模模型相當的表現。
English
Complex chart understanding tasks demand advanced visual recognition and
reasoning capabilities from multimodal large language models (MLLMs). However,
current research provides limited coverage of complex chart scenarios and
computation-intensive reasoning tasks prevalent in real-world applications.
This study proposes an automated multi-stage code-driven pipeline for
systematically generating visual reasoning datasets to address these
limitations. The pipeline integrates retrieval-augmented generation (RAG) to
retrieve professional chart templates and employs chain-of-thought (CoT)
strategies to generate reasoning codes that simulate real data distributions,
thereby driving chart rendering and question-related statistical computations.
Through model-based evaluation, the pipeline enhances chart diversity and data
quality. Using this framework, we construct ChartM^3, a multi-dimensional and
multi-step dataset containing 38K charts and 142K Q&A pairs for training, along
with 2,871 high-quality evaluation samples for enabling practical performance
assessment. Supervised fine-tuning (SFT) and reinforcement learning (RL)
experiments demonstrate that our dataset significantly improves reasoning
capabilities and cross-domain generalization performance, enabling smaller
models to achieve performance comparable to larger-scale models in complex
chart comprehension.