VisualPRM:一種適用於多模態推理的高效過程獎勵模型
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
March 13, 2025
作者: Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, Wenhai Wang
cs.AI
摘要
我們推出VisualPRM,這是一個擁有80億參數的先進多模態過程獎勵模型(PRM),它通過Best-of-N(BoN)評估策略提升了現有多模態大語言模型(MLLMs)在不同模型規模和家族中的推理能力。具體而言,我們的模型提升了三種類型MLLMs和四種不同模型規模的推理性能。即使應用於性能卓越的InternVL2.5-78B,它也在七個多模態推理基準上實現了5.9分的提升。實驗結果表明,在BoN評估中,我們的模型相比結果獎勵模型和自我一致性展現出更優異的性能。為了促進多模態PRMs的訓練,我們利用自動化數據管道構建了一個多模態過程監督數據集VisualPRM400K。針對多模態PRMs的評估,我們提出了VisualProcessBench,這是一個帶有人工註釋的逐步正確性標籤的基準,用於衡量PRMs在多模態推理任務中檢測錯誤步驟的能力。我們希望我們的工作能激發更多未來研究,並為MLLMs的發展做出貢獻。我們的模型、數據和基準已發佈於https://internvl.github.io/blog/2025-03-13-VisualPRM/。
English
We introduce VisualPRM, an advanced multimodal Process Reward Model (PRM)
with 8B parameters, which improves the reasoning abilities of existing
Multimodal Large Language Models (MLLMs) across different model scales and
families with Best-of-N (BoN) evaluation strategies. Specifically, our model
improves the reasoning performance of three types of MLLMs and four different
model scales. Even when applied to the highly capable InternVL2.5-78B, it
achieves a 5.9-point improvement across seven multimodal reasoning benchmarks.
Experimental results show that our model exhibits superior performance compared
to Outcome Reward Models and Self-Consistency during BoN evaluation. To
facilitate the training of multimodal PRMs, we construct a multimodal process
supervision dataset VisualPRM400K using an automated data pipeline. For the
evaluation of multimodal PRMs, we propose VisualProcessBench, a benchmark with
human-annotated step-wise correctness labels, to measure the abilities of PRMs
to detect erroneous steps in multimodal reasoning tasks. We hope that our work
can inspire more future research and contribute to the development of MLLMs.
Our model, data, and benchmark are released in
https://internvl.github.io/blog/2025-03-13-VisualPRM/.Summary
AI-Generated Summary