多模態獎勵基準：視覺語言模型獎勵機制的全面評估

摘要

獎勵模型在訓練視覺語言模型（VLMs）中扮演著至關重要的角色，通過評估輸出品質來實現與人類偏好的對齊。儘管其重要性不言而喻，研究界仍缺乏全面的開放基準來評估VLMs中的多模態獎勵模型。為填補這一空白，我們推出了Multimodal RewardBench，這是一個專家註釋的基準，涵蓋六大領域：通用正確性、偏好、知識、推理、安全性及視覺問答。我們的數據集包含5,211個從多種VLMs收集的（提示、選中回應、拒絕回應）三元組，並進行了註釋。在評估一系列VLM評判者時，我們發現即使表現最佳的模型，如Gemini 1.5 Pro和Claude 3.5 Sonnet，其總體準確率也僅達到72%。值得注意的是，大多數模型在推理和安全性領域表現欠佳。這些發現表明，Multimodal RewardBench為推進跨多領域的獎勵模型開發提供了一個具有挑戰性的測試平台。我們已將此基準發布於https://github.com/facebookresearch/multimodal_rewardbench。

English

Reward models play an essential role in training vision-language models (VLMs) by assessing output quality to enable aligning with human preferences. Despite their importance, the research community lacks comprehensive open benchmarks for evaluating multimodal reward models in VLMs. To address this gap, we introduce Multimodal RewardBench, an expert-annotated benchmark covering six domains: general correctness, preference, knowledge, reasoning, safety, and visual question-answering. Our dataset comprises 5,211 annotated (prompt, chosen response, rejected response) triplets collected from various VLMs. In evaluating a range of VLM judges, we find that even the top-performing models, Gemini 1.5 Pro and Claude 3.5 Sonnet, achieve only 72% overall accuracy. Notably, most models struggle in the reasoning and safety domains. These findings suggest that Multimodal RewardBench offers a challenging testbed for advancing reward model development across multiple domains. We release the benchmark at https://github.com/facebookresearch/multimodal_rewardbench.

多模態獎勵基準：視覺語言模型獎勵機制的全面評估

Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models

摘要

Support