MathFlow:提升多模态大语言模型在视觉数学问题中的感知流畅性
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
March 19, 2025
作者: Felix Chen, Hangjie Yuan, Yunqiu Xu, Tao Feng, Jun Cen, Pengwei Liu, Zeying Huang, Yi Yang
cs.AI
摘要
尽管多模态大语言模型(MLLMs)在多种任务中展现出卓越性能,但在视觉数学问题解决方面,尤其是在准确感知和解读图表方面,尚未充分展现其潜力。受人类典型解题过程的启发,我们假设从图表中提取有意义信息的感知能力至关重要,因为它直接影响后续的推理过程。为验证这一假设,我们开发了FlowVerse,一个全面基准测试,将解题过程中使用的所有信息分为四个组成部分,并组合成六个问题版本进行评估。我们在FlowVerse上的初步结果显示,现有的MLLMs在从图表中提取关键信息和推理属性,以及基于这些视觉输入进行复杂推理时,存在显著局限。为此,我们提出了MathFlow,一个模块化的问题解决流程,将感知与推理解耦为独立阶段,从而分别优化。鉴于当前MLLMs在感知方面的局限,我们训练了MathFlow-P-7B作为专门的感知模型。实验结果表明,当MathFlow-P-7B与各种闭源和开源的推理模型集成时,带来了显著的性能提升。这证明了MathFlow流程的有效性及其与多种推理框架的兼容性。FlowVerse基准测试和代码可在https://github.com/MathFlow-zju/MathFlow获取。
English
Despite impressive performance across diverse tasks, Multimodal Large
Language Models (MLLMs) have yet to fully demonstrate their potential in visual
mathematical problem-solving, particularly in accurately perceiving and
interpreting diagrams. Inspired by typical processes of humans, we hypothesize
that the perception capabilities to extract meaningful information from
diagrams is crucial, as it directly impacts subsequent inference processes. To
validate this hypothesis, we developed FlowVerse, a comprehensive benchmark
that categorizes all information used during problem-solving into four
components, which are then combined into six problem versions for evaluation.
Our preliminary results on FlowVerse reveal that existing MLLMs exhibit
substantial limitations when extracting essential information and reasoned
property from diagrams and performing complex reasoning based on these visual
inputs. In response, we introduce MathFlow, a modular problem-solving pipeline
that decouples perception and inference into distinct stages, thereby
optimizing each independently. Given the perceptual limitations observed in
current MLLMs, we trained MathFlow-P-7B as a dedicated perception model.
Experimental results indicate that MathFlow-P-7B yields substantial performance
gains when integrated with various closed-source and open-source inference
models. This demonstrates the effectiveness of the MathFlow pipeline and its
compatibility to diverse inference frameworks. The FlowVerse benchmark and code
are available at https://github.com/MathFlow-zju/MathFlow.Summary
AI-Generated Summary