ChatPaper.aiChatPaper

START:面向图表理解的空间与文本联合学习框架

START: Spatial and Textual Learning for Chart Understanding

December 8, 2025
作者: Zhuoming Liu, Xiaofeng Gao, Feiyang Niu, Qiaozi Gao, Liu Liu, Robinson Piramuthu
cs.AI

摘要

图表理解对于在多模态大语言模型(MLLMs)中部署现实应用场景(如科学论文与技术报告分析)至关重要。与自然图像不同,图表同时具备结构化视觉布局(空间属性)和底层数据表征(文本属性)——精确的细粒度图表推理需要同时掌握这两种特性。基于此发现,我们提出START框架(面向图表理解的空间与文本联合学习)。具体而言,我们引入(1)图表元素定位和(2)图表转代码生成两项技术,以增强MLLM对图表视觉布局与数据细节的联合理解能力。为促进空间与文本学习,我们通过新型数据生成流程构建了START数据集:首先利用MLLM将真实图表图像转换为可执行图表代码,在保持真实图表视觉分布的同时还原其底层数据表征;随后通过大语言模型(LLM)对代码进行演化,精确定位捕捉图表视觉结构的元素空间位置,解决现有方法难以应对的挑战。为评估模型对图表空间结构的理解能力,我们提出图表空间理解基准(CS-Bench),填补了全面图表理解评估的关键空白。通过空间与文本联合学习,START在不同模型规模与基准测试中均较基础模型实现稳定提升,并以显著优势超越现有最优方法。代码、数据及模型将公开提供。
English
Chart understanding is crucial for deploying multimodal large language models (MLLMs) in real-world scenarios such as analyzing scientific papers and technical reports. Unlike natural images, charts pair a structured visual layout (spatial property) with an underlying data representation (textual property) -- grasping both is essential for precise, fine-grained chart reasoning. Motivated by this observation, we propose START, the Spatial and Textual learning for chART understanding. Specifically, we introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details. To facilitate spatial and textual learning, we propose the START-Dataset generated with a novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. We then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle. To evaluate a model's ability to understand chart spatial structures, we propose the Chart Spatial understanding Benchmark (CS-Bench), filling a critical gap in comprehensive chart understanding evaluation. Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin. Code, data and models will be publicly available.
PDF22December 17, 2025