ChatPaper.aiChatPaper

ChartMimic:通過圖表代碼生成評估LMM的跨模態推理能力

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

June 14, 2024
作者: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang
cs.AI

摘要

我們引入了一個新的基準,名為ChartMimic,旨在評估大型多模態模型(LMMs)的視覺導向程式碼生成能力。ChartMimic利用資訊密集的視覺圖表和文字指示作為輸入,要求LMMs生成用於圖表呈現的相應程式碼。ChartMimic包含1,000個人工精選的(圖表,指示,程式碼)三元組,代表科學論文中各個領域(例如物理學,計算機科學,經濟學等)中發現的真實圖表使用案例。這些圖表涵蓋18種常規類型和4種高級類型,分為191個子類別。此外,我們提出了多級評估指標,以提供對輸出程式碼和呈現的圖表進行自動和全面評估。與現有的程式碼生成基準不同,ChartMimic著重評估LMMs協調一系列認知能力的能力,包括視覺理解、程式碼生成和跨模態推理。對3個專有模型和11個開放權重模型的評估突顯了ChartMimic帶來的重大挑戰。即使是先進的GPT-4V、Claude-3-opus的平均分別僅達到73.2和53.7,顯示有很大的改進空間。我們預期ChartMimic將激發LMMs的發展,推動人工通用智能的追求。
English
We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which represent the authentic chart use cases found in scientific papers across various domains(e.g., Physics, Computer Science, Economics, etc). These charts span 18 regular types and 4 advanced types, diversifying into 191 subcategories. Furthermore, we propose multi-level evaluation metrics to provide an automatic and thorough assessment of the output code and the rendered charts. Unlike existing code generation benchmarks, ChartMimic places emphasis on evaluating LMMs' capacity to harmonize a blend of cognitive capabilities, encompassing visual understanding, code generation, and cross-modal reasoning. The evaluation of 3 proprietary models and 11 open-weight models highlights the substantial challenges posed by ChartMimic. Even the advanced GPT-4V, Claude-3-opus only achieve an average score of 73.2 and 53.7, respectively, indicating significant room for improvement. We anticipate that ChartMimic will inspire the development of LMMs, advancing the pursuit of artificial general intelligence.

Summary

AI-Generated Summary

PDF562December 6, 2024