LLaVA-Critic：マルチモーダルモデルの評価を学習する

要旨

LLaVA-Criticを紹介します。これは、広範囲のマルチモーダルタスクにわたるパフォーマンスを評価するための一般的な評価者として設計された初のオープンソースの大規模マルチモーダルモデル（LMM）です。LLaVA-Criticは、多様な評価基準とシナリオを組み込んだ高品質の評価者指示に従うデータセットを使用してトレーニングされています。私たちの実験では、このモデルの効果を示しました。具体的には、(1) LMM-としてジャッジとして、LLaVA-Criticは信頼性のある評価スコアを提供し、複数の評価ベンチマークでGPTモデルと同等またはそれを上回るパフォーマンスを発揮します。そして(2) 好み学習において、好み学習のための報酬信号を生成し、モデルの整合性能力を向上させます。この研究は、オープンソースLMMの自己批評と評価の潜在能力を強調し、LMM向けのスケーラブルで超人的な整合フィードバックメカニズムに向けた将来の研究の舞台を設定しています。

English

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (2) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

LLaVA-Critic：マルチモーダルモデルの評価を学習する

LLaVA-Critic: Learning to Evaluate Multimodal Models

要旨

Support