ChatPaper.aiChatPaper

面向增强型多模态大语言模型评判器的多任务强化学习

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

March 12, 2026
作者: Junjie Wu, Xuan Kan, Zihao He, Shunwen Tan, Bo Pan, Kaitai Zhang
cs.AI

摘要

多模态大语言模型(MLLM)因其在各类视觉任务中与人类评判标准高度契合,已被广泛采纳为"MLLM即评判官"解决方案。然而现有大多数评判模型仅针对单任务场景优化,难以泛化至多样化语境,而这一特性恰恰是可靠评估的关键需求。为突破此局限,我们提出面向"MLLM即评判官"的多任务强化学习框架(MT-RL-Judge),该框架通过强化学习的泛化能力,实现评判模型在多任务上的联合优化。与多个强基线模型的对比实验表明,MT-RL-Judge在评判一致性及与人类偏好相关性方面均优于现有强基线。此外,我们的方法在分布外任务上展现出稳健的泛化能力,进一步验证了其有效性。
English
Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle to generalize to diverse contexts, which is a critical requirement for reliable evaluation. To address this limitation, we propose Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that jointly optimizes the judge model across multiple tasks, leveraging the generalization capabilities of RL. Experimental results against several strong baselines demonstrate that MT-RL-Judge outperforms strong baselines in both judgment consistency and correlation with human preferences. Furthermore, our approach exhibits robust generalization on out-of-distribution tasks, further validating its effectiveness.
PDF31March 15, 2026