マルチモーダルLLM審査機能の強化のためのマルチタスク強化学習

要旨

マルチモーダル大規模言語モデル（MLLM）は、様々な視覚タスクにおいて人間の判断との高い一致を示すことから、「MLLM-as-a-Judge」（審判としてのMLLM）として広く採用されている。しかし、既存の審判モデルの多くは単一タスクシナリオ向けに最適化されており、信頼性のある評価に不可欠な要件である多様な文脈への汎化が困難である。この課題を解決するため、我々は複数タスクにわたって審判モデルを共同最適化し、RLの汎化能力を活用するフレームワーク「Multi-Task Reinforcement Learning for MLLM-as-a-Judge（MT-RL-Judge）」を提案する。いくつかの強力なベースラインとの比較実験により、MT-RL-Judgeが判断の一貫性と人間の選好との相関の両方において、強力なベースラインを上回ることを実証した。さらに、本手法は分布外タスクにおいても頑健な汎化性能を示し、その有効性をさらに裏付けている。

English

Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle to generalize to diverse contexts, which is a critical requirement for reliable evaluation. To address this limitation, we propose Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that jointly optimizes the judge model across multiple tasks, leveraging the generalization capabilities of RL. Experimental results against several strong baselines demonstrate that MT-RL-Judge outperforms strong baselines in both judgment consistency and correlation with human preferences. Furthermore, our approach exhibits robust generalization on out-of-distribution tasks, further validating its effectiveness.

マルチモーダルLLM審査機能の強化のためのマルチタスク強化学習

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

要旨

Support