ChatPaper.aiChatPaper

MathFlow:提升多模态大语言模型在视觉数学问题中的感知流

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

March 19, 2025
作者: Felix Chen, Hangjie Yuan, Yunqiu Xu, Tao Feng, Jun Cen, Pengwei Liu, Zeying Huang, Yi Yang
cs.AI

摘要

儘管多模態大語言模型(MLLMs)在多樣化任務中展現了令人印象深刻的性能,但在視覺數學問題解決方面,尤其是在準確感知和解釋圖表方面,它們尚未充分發揮其潛力。受人類典型思維過程的啟發,我們假設從圖表中提取有意義信息的感知能力至關重要,因為它直接影響後續的推理過程。為驗證這一假設,我們開發了FlowVerse,這是一個全面的基準測試,將問題解決過程中使用的所有信息分類為四個組件,並將其組合成六個問題版本進行評估。我們在FlowVerse上的初步結果顯示,現有的MLLMs在提取圖表中的關鍵信息和推理屬性,以及基於這些視覺輸入進行複雜推理方面存在顯著限制。為此,我們引入了MathFlow,這是一個模塊化的問題解決流程,將感知和推理解耦為獨立的階段,從而分別優化每個階段。考慮到當前MLLMs在感知方面的局限性,我們訓練了MathFlow-P-7B作為專用的感知模型。實驗結果表明,當MathFlow-P-7B與各種閉源和開源推理模型集成時,能帶來顯著的性能提升。這證明了MathFlow流程的有效性及其與多樣化推理框架的兼容性。FlowVerse基準測試和代碼可在https://github.com/MathFlow-zju/MathFlow獲取。
English
Despite impressive performance across diverse tasks, Multimodal Large Language Models (MLLMs) have yet to fully demonstrate their potential in visual mathematical problem-solving, particularly in accurately perceiving and interpreting diagrams. Inspired by typical processes of humans, we hypothesize that the perception capabilities to extract meaningful information from diagrams is crucial, as it directly impacts subsequent inference processes. To validate this hypothesis, we developed FlowVerse, a comprehensive benchmark that categorizes all information used during problem-solving into four components, which are then combined into six problem versions for evaluation. Our preliminary results on FlowVerse reveal that existing MLLMs exhibit substantial limitations when extracting essential information and reasoned property from diagrams and performing complex reasoning based on these visual inputs. In response, we introduce MathFlow, a modular problem-solving pipeline that decouples perception and inference into distinct stages, thereby optimizing each independently. Given the perceptual limitations observed in current MLLMs, we trained MathFlow-P-7B as a dedicated perception model. Experimental results indicate that MathFlow-P-7B yields substantial performance gains when integrated with various closed-source and open-source inference models. This demonstrates the effectiveness of the MathFlow pipeline and its compatibility to diverse inference frameworks. The FlowVerse benchmark and code are available at https://github.com/MathFlow-zju/MathFlow.

Summary

AI-Generated Summary

PDF143March 24, 2025