ChatPaper.aiChatPaper

RLCD:從對比蒸餾中進行強化學習,用於語言模型對齊

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

July 24, 2023
作者: Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian
cs.AI

摘要

我們提出了對比蒸餾強化學習(RLCD)方法,用於使語言模型遵循自然語言原則,而無需使用人類反饋。RLCD通過使用對比的正面和負面提示生成的模擬偏好對來訓練偏好模型,這些對包含高質量和低質量示例。然後使用偏好模型通過強化學習來改進基礎未對齊的語言模型。從實證上看,RLCD在三個不同的對齊任務(無害性、幫助性和故事大綱生成)以及偏好數據模擬的7B和30B模型規模上均優於RLAIF(Bai等人,2022b)和上下文蒸餾(Huang等人,2022)基準。
English
We propose Reinforcement Learning from Contrast Distillation (RLCD), a method for aligning language models to follow natural language principles without using human feedback. RLCD trains a preference model using simulated preference pairs that contain both a high-quality and low-quality example, generated using contrasting positive and negative prompts. The preference model is then used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022b) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and on both 7B and 30B model scales for preference data simulation.
PDF100December 15, 2024