ChatPaper.aiChatPaper

HelpSteer3-偏好:跨多样任务与语言的开源人类标注偏好数据集

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

May 16, 2025
作者: Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Hoo-Chang Shin, Felipe Soares, Alexander Bukharin, Ellie Evans, Yi Dong, Oleksii Kuchaiev
cs.AI

摘要

偏好数据集对于通过人类反馈强化学习(RLHF)训练通用领域、遵循指令的语言模型至关重要。随着每次新数据的发布,对未来数据收集的期望也随之提高,这意味着持续提升公开可用偏好数据的质量和多样性成为必然需求。为应对这一需求,我们推出了HelpSteer3-Preference,这是一个采用宽松许可(CC-BY-4.0)的高质量、人工标注的偏好数据集,包含超过40,000个样本。这些样本涵盖了大型语言模型(LLMs)在现实世界中的多样化应用,包括STEM、编程及多语言场景相关任务。利用HelpSteer3-Preference,我们训练了奖励模型(RMs),在RM-Bench(82.4%)和JudgeBench(73.7%)上取得了顶尖性能,相较于现有RMs的最佳报告结果实现了显著提升(约10%绝对提升)。我们展示了HelpSteer3-Preference同样适用于训练生成式RMs,并说明了如何利用我们的RMs通过RLHF对齐策略模型。数据集(CC-BY-4.0)地址:https://huggingface.co/datasets/nvidia/HelpSteer3#preference。
English
Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0), high-quality, human-annotated preference dataset comprising of over 40,000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs. Dataset (CC-BY-4.0): https://huggingface.co/datasets/nvidia/HelpSteer3#preference

Summary

AI-Generated Summary

PDF22May 20, 2025