ChatPaper.aiChatPaper

Skywork-Reward:LLM中奖励建模的一揽子技巧

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

October 24, 2024
作者: Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou
cs.AI

摘要

在本报告中,我们介绍了一系列增强LLMs奖励建模的方法,重点关注数据中心的技术。我们提出了有效的数据选择和过滤策略,用于筛选高质量的开源偏好数据集,最终形成了Skywork-Reward数据集,其中仅包含80K个偏好对,明显小于现有数据集。利用这一筛选后的数据集,我们开发了Skywork-Reward模型系列——Skywork-Reward-Gemma-27B和Skywork-Reward-Llama-3.1-8B,前者目前在RewardBench排行榜上名列前茅。值得注意的是,我们的技术和数据集直接提升了许多在RewardBench上排名靠前的模型的性能,突显了我们在实际偏好学习应用中贡献的实际影响。
English
In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Summary

AI-Generated Summary

PDF192November 16, 2024