ChatPaper.aiChatPaper

ChartGemma:图表推理中的视觉指导调整

ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

July 4, 2024
作者: Ahmed Masry, Megh Thakkar, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, Shafiq Joty
cs.AI

摘要

随着图表在各行业和科学领域作为数据分析、可视化和决策工具的普及,人们对开发预训练基础模型以及通用指导调整模型来理解和推理图表的兴趣日益增长。然而,现有方法存在两个关键方面的重要缺陷,影响了图表表示模型的性能:它们是在生成自图表的基础数据表的数据上进行训练的,忽略了图表图像中的视觉趋势和模式,并且使用了弱对齐的视觉-语言骨干模型进行领域特定训练,限制了其在遇到真实图表时的泛化能力。我们解决了这些重要缺陷,并介绍了ChartGemma,这是一种新颖的图表理解和推理模型,是在PaliGemma基础上开发的。ChartGemma不依赖于基础数据表,而是在直接从图表图像生成的指导调整数据上进行训练,从而捕捉来自各种图表的高级趋势和低级视觉信息。我们的简单方法在涵盖图表总结、问题回答和事实核查的5个基准测试中取得了最先进的结果,我们对真实世界图表进行了详尽的定性研究,结果显示ChartGemma相对于同行产品生成的摘要更加真实和准确。我们在https://github.com/vis-nlp/ChartGemma发布了代码、模型检查点、数据集和演示。
English
Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, and use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across 5 benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries. We release the code, model checkpoints, dataset, and demos at https://github.com/vis-nlp/ChartGemma.

Summary

AI-Generated Summary

PDF276November 28, 2024