ChatPaper.aiChatPaper

R3:鲁棒性评分标准无关的奖励模型

R3: Robust Rubric-Agnostic Reward Models

May 19, 2025
作者: David Anugraha, Zilu Tang, Lester James V. Miranda, Hanyang Zhao, Mohammad Rifqi Farhansyah, Garry Kuwanto, Derry Wijaya, Genta Indra Winata
cs.AI

摘要

奖励模型对于将语言模型输出与人类偏好对齐至关重要,然而现有方法往往缺乏可控性和可解释性。这些模型通常针对狭窄目标进行优化,限制了其在更广泛下游任务中的通用性。此外,其标量输出在没有上下文推理的情况下难以解释。为解决这些局限,我们引入了R3,一种新颖的奖励建模框架,它不受评分标准限制,能在多个评估维度上通用,并提供可解释、有理据的评分分配。R3支持对语言模型进行更透明、更灵活的评估,促进与多样化人类价值观和用例的稳健对齐。我们的模型、数据和代码已在https://github.com/rubricreward/r3开源提供。
English
Reward models are essential for aligning language model outputs with human preferences, yet existing approaches often lack both controllability and interpretability. These models are typically optimized for narrow objectives, limiting their generalizability to broader downstream tasks. Moreover, their scalar outputs are difficult to interpret without contextual reasoning. To address these limitations, we introduce R3, a novel reward modeling framework that is rubric-agnostic, generalizable across evaluation dimensions, and provides interpretable, reasoned score assignments. R3 enables more transparent and flexible evaluation of language models, supporting robust alignment with diverse human values and use cases. Our models, data, and code are available as open source at https://github.com/rubricreward/r3

Summary

AI-Generated Summary

PDF71May 20, 2025