面向增强型多模态大语言模型评判器的多任务强化学习

摘要

多模态大语言模型（MLLM）因其在各类视觉任务中与人类评判标准高度契合，已被广泛采纳为"MLLM即评判官"解决方案。然而现有大多数评判模型仅针对单任务场景优化，难以泛化至多样化语境，而这一特性恰恰是可靠评估的关键需求。为突破此局限，我们提出面向"MLLM即评判官"的多任务强化学习框架（MT-RL-Judge），该框架通过强化学习的泛化能力，实现评判模型在多任务上的联合优化。与多个强基线模型的对比实验表明，MT-RL-Judge在评判一致性及与人类偏好相关性方面均优于现有强基线。此外，我们的方法在分布外任务上展现出稳健的泛化能力，进一步验证了其有效性。

English

Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle to generalize to diverse contexts, which is a critical requirement for reliable evaluation. To address this limitation, we propose Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that jointly optimizes the judge model across multiple tasks, leveraging the generalization capabilities of RL. Experimental results against several strong baselines demonstrate that MT-RL-Judge outperforms strong baselines in both judgment consistency and correlation with human preferences. Furthermore, our approach exhibits robust generalization on out-of-distribution tasks, further validating its effectiveness.

面向增强型多模态大语言模型评判器的多任务强化学习

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

摘要

Support