HelpSteer3-偏好:跨多樣任務與語言的開放式人類註解偏好數據集
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
May 16, 2025
作者: Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Hoo-Chang Shin, Felipe Soares, Alexander Bukharin, Ellie Evans, Yi Dong, Oleksii Kuchaiev
cs.AI
摘要
偏好數據集對於訓練基於人類反饋強化學習(RLHF)的通用領域指令遵循語言模型至關重要。每一次後續的數據發布都提升了對未來數據收集的期望,這意味著不斷提升公開可用偏好數據的質量與多樣性成為持續需求。為應對這一需求,我們推出了HelpSteer3-Preference,這是一個採用寬鬆許可(CC-BY-4.0)的高質量、人工標註的偏好數據集,包含超過40,000個樣本。這些樣本涵蓋了大語言模型(LLMs)在現實世界中的多樣化應用,包括與STEM、編程及多語言場景相關的任務。利用HelpSteer3-Preference,我們訓練了獎勵模型(RMs),其在RM-Bench(82.4%)和JudgeBench(73.7%)上均取得了頂尖性能,相較於現有RMs先前報告的最佳結果,實現了顯著提升(約10%的絕對值)。我們展示了HelpSteer3-Preference同樣可用於訓練生成式RMs,並闡述了如何利用我們的RMs通過RLHF對策略模型進行對齊。數據集(CC-BY-4.0)鏈接:https://huggingface.co/datasets/nvidia/HelpSteer3#preference
English
Preference datasets are essential for training general-domain,
instruction-following language models with Reinforcement Learning from Human
Feedback (RLHF). Each subsequent data release raises expectations for future
data collection, meaning there is a constant need to advance the quality and
diversity of openly available preference data. To address this need, we
introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0),
high-quality, human-annotated preference dataset comprising of over 40,000
samples. These samples span diverse real-world applications of large language
models (LLMs), including tasks relating to STEM, coding and multilingual
scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that
achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This
represents a substantial improvement (~10% absolute) over the previously
best-reported results from existing RMs. We demonstrate HelpSteer3-Preference
can also be applied to train Generative RMs and how policy models can be
aligned with RLHF using our RMs. Dataset (CC-BY-4.0):
https://huggingface.co/datasets/nvidia/HelpSteer3#preferenceSummary
AI-Generated Summary