ChartMimic:通过图表生成代码评估LMM的跨模态推理能力
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
June 14, 2024
作者: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang
cs.AI
摘要
我们引入了一个新的基准测试,名为ChartMimic,旨在评估大型多模态模型(LMMs)的视觉基础代码生成能力。ChartMimic利用信息密集型的视觉图表和文本指令作为输入,要求LMMs生成用于图表呈现的相应代码。ChartMimic包括1,000个人工策划的(图表,指令,代码)三元组,代表了科学论文中各个领域(例如物理学,计算机科学,经济学等)中发现的真实图表用例。这些图表涵盖了18种常规类型和4种高级类型,分为191个子类别。此外,我们提出了多级评估指标,以对输出代码和呈现的图表进行自动和全面的评估。与现有的代码生成基准测试不同,ChartMimic侧重于评估LMMs协调一系列认知能力的能力,包括视觉理解、代码生成和跨模态推理。对3个专有模型和11个开放权重模型的评估突显了ChartMimic带来的重大挑战。即使是先进的GPT-4V,Claude-3-opus仅分别达到73.2和53.7的平均分,表明有很大的改进空间。我们预计ChartMimic将激发LMMs的发展,推动人工通用智能的追求。
English
We introduce a new benchmark, ChartMimic, aimed at assessing the
visually-grounded code generation capabilities of large multimodal models
(LMMs). ChartMimic utilizes information-intensive visual charts and textual
instructions as inputs, requiring LMMs to generate the corresponding code for
chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction,
code) triplets, which represent the authentic chart use cases found in
scientific papers across various domains(e.g., Physics, Computer Science,
Economics, etc). These charts span 18 regular types and 4 advanced types,
diversifying into 191 subcategories. Furthermore, we propose multi-level
evaluation metrics to provide an automatic and thorough assessment of the
output code and the rendered charts. Unlike existing code generation
benchmarks, ChartMimic places emphasis on evaluating LMMs' capacity to
harmonize a blend of cognitive capabilities, encompassing visual understanding,
code generation, and cross-modal reasoning. The evaluation of 3 proprietary
models and 11 open-weight models highlights the substantial challenges posed by
ChartMimic. Even the advanced GPT-4V, Claude-3-opus only achieve an average
score of 73.2 and 53.7, respectively, indicating significant room for
improvement. We anticipate that ChartMimic will inspire the development of
LMMs, advancing the pursuit of artificial general intelligence.Summary
AI-Generated Summary