LLaVA-Critic:学习评估多模型
LLaVA-Critic: Learning to Evaluate Multimodal Models
October 3, 2024
作者: Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li
cs.AI
摘要
我们介绍了LLaVA-Critic,这是第一个开源的大型多模态模型(LMM),旨在作为通用评估器,评估广泛的多模态任务性能。LLaVA-Critic使用高质量的评论指令跟随数据集进行训练,该数据集包含多样的评估标准和场景。我们的实验表明该模型在两个关键领域的有效性:(1)LMM作为评判者,在多个评估基准上,LLaVA-Critic提供可靠的评估分数,表现与或超过GPT模型;(2)偏好学习,它为偏好学习生成奖励信号,增强模型对齐能力。这项工作强调了开源LMM在自我批评和评估中的潜力,为未来研究提供了舞台,探讨LMM的可扩展、超人类对齐反馈机制。
English
We introduce LLaVA-Critic, the first open-source large multimodal model (LMM)
designed as a generalist evaluator to assess performance across a wide range of
multimodal tasks. LLaVA-Critic is trained using a high-quality critic
instruction-following dataset that incorporates diverse evaluation criteria and
scenarios. Our experiments demonstrate the model's effectiveness in two key
areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation
scores, performing on par with or surpassing GPT models on multiple evaluation
benchmarks; and (2) Preference Learning, where it generates reward signals for
preference learning, enhancing model alignment capabilities. This work
underscores the potential of open-source LMMs in self-critique and evaluation,
setting the stage for future research into scalable, superhuman alignment
feedback mechanisms for LMMs.Summary
AI-Generated Summary