ChatPaper.aiChatPaper

Omni-Reward:基于自由形式偏好的通用全模态奖励模型构建

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

October 27, 2025
作者: Zhuoran Jin, Hongbang Yuan, Kejian Zhu, Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
cs.AI

摘要

奖励模型在将AI行为与人类偏好对齐方面发挥着关键作用,但仍面临两大根本性挑战:(1)模态失衡——现有奖励模型主要集中于文本和图像模态,对视频、音频等其他模态的支持有限;(2)偏好固化——基于固定二元偏好对的训练难以捕捉个性化偏好的复杂性和多样性。为解决上述问题,我们提出Omni-Reward通用奖励建模框架,通过支持自由形式偏好向通用全模态奖励建模迈出重要一步,具体包含:(1)评估体系:建立首个支持自由形式偏好的全模态奖励模型基准Omni-RewardBench,涵盖文本、图像、视频、音频及3D五大模态的九类任务;(2)数据构建:打造多模态偏好数据集Omni-RewardData,包含24.8万组通用偏好对和6.9万组指令调优对,用于训练通用全模态奖励模型;(3)模型架构:提出Omni-RewardModel,集成判别式与生成式奖励模型,在Omni-RewardBench及其他主流奖励建模基准上均表现出色。
English
Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.
PDF261December 31, 2025