ChatPaper.aiChatPaper

Math-LLaVA:為多模態大型語言模型啟動數學推理

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

June 25, 2024
作者: Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee
cs.AI

摘要

大型語言模型(LLMs)展示了令人印象深刻的推理能力,特別是在文本數學問題解決方面。然而,現有的開源圖像指令微調數據集對每個圖像包含的問答對數量有限,並未充分利用視覺信息來增強多模態語言模型(MLLMs)的數學推理能力。為彌補這一差距,我們通過從24個現有數據集中收集40K張高質量圖像及其問答對,並合成320K個新對,創建了MathV360K數據集,從而擴展了多模態數學問題的廣度和深度。我們介紹了Math-LLaVA,這是一個基於LLaVA-1.5的模型,通過MathV360K進行微調。這種新穎方法顯著提高了LLaVA-1.5的多模態數學推理能力,使其在MathVista的minitest分割上實現了19個點的增長,並達到了與GPT-4V相當的性能。此外,Math-LLaVA展示了增強的泛化能力,在MMMUBenchmark上實現了顯著的改進。我們的研究凸顯了數據集多樣性和合成在提升MLLMs數學推理能力方面的重要性。代碼和數據可在以下鏈接獲取:https://github.com/HZQ950419/Math-LLaVA。
English
Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions. We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista's minitest split. Furthermore, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. Our research highlights the importance of dataset diversity and synthesis in advancing MLLMs' mathematical reasoning abilities. The code and data are available at: https://github.com/HZQ950419/Math-LLaVA.

Summary

AI-Generated Summary

PDF111November 29, 2024