ChatPaper.aiChatPaper

R3:鲁棒性评分标准无关的奖励模型

R3: Robust Rubric-Agnostic Reward Models

May 19, 2025
作者: David Anugraha, Zilu Tang, Lester James V. Miranda, Hanyang Zhao, Mohammad Rifqi Farhansyah, Garry Kuwanto, Derry Wijaya, Genta Indra Winata
cs.AI

摘要

獎勵模型對於使語言模型輸出與人類偏好保持一致至關重要,然而現有方法往往缺乏可控性和可解釋性。這些模型通常針對狹窄的目標進行優化,限制了其在更廣泛下游任務中的通用性。此外,其標量輸出若無上下文推理則難以解讀。為解決這些限制,我們引入了R3,這是一種新穎的獎勵建模框架,它不受評分標準限制,可跨評估維度通用,並提供可解釋、有理有據的評分分配。R3使得語言模型的評估更加透明和靈活,支持與多樣化的人類價值觀和應用場景的穩健對齊。我們的模型、數據和代碼已在https://github.com/rubricreward/r3上開源提供。
English
Reward models are essential for aligning language model outputs with human preferences, yet existing approaches often lack both controllability and interpretability. These models are typically optimized for narrow objectives, limiting their generalizability to broader downstream tasks. Moreover, their scalar outputs are difficult to interpret without contextual reasoning. To address these limitations, we introduce R3, a novel reward modeling framework that is rubric-agnostic, generalizable across evaluation dimensions, and provides interpretable, reasoned score assignments. R3 enables more transparent and flexible evaluation of language models, supporting robust alignment with diverse human values and use cases. Our models, data, and code are available as open source at https://github.com/rubricreward/r3

Summary

AI-Generated Summary

PDF71May 20, 2025