ChatPaper.aiChatPaper

DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging

July 1, 2024
Authors: Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Yun-Nung Chen
cs.AI

Abstract

Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the Domain knowledge merged Reward Model (DogeRM), a novel framework that integrates domain-specific knowledge into a general reward model by model merging. The experiments demonstrate that DogeRM enhances performance across different benchmarks and provide a detailed analysis showcasing the effects of model merging, showing the great potential of facilitating model alignment.

Summary

AI-Generated Summary

PDF61November 28, 2024