ChatPaper.aiChatPaper

RLCD:对比蒸馏强化学习用于语言模型对齐

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

July 24, 2023
作者: Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian
cs.AI

摘要

我们提出了对比蒸馏强化学习(RLCD)方法,用于使语言模型遵循自然语言原则,而无需使用人类反馈。RLCD使用模拟偏好对来训练偏好模型,这些偏好对包含使用对比正负提示生成的高质量和低质量示例。然后,使用偏好模型通过强化学习来改进基础未对齐的语言模型。从经验上看,RLCD在三个不同的对齐任务(无害性、有用性和故事大纲生成)以及对偏好数据模拟的7B和30B模型规模上优于RLAIF(Bai等,2022b)和上下文蒸馏(Huang等,2022)基线。
English
We propose Reinforcement Learning from Contrast Distillation (RLCD), a method for aligning language models to follow natural language principles without using human feedback. RLCD trains a preference model using simulated preference pairs that contain both a high-quality and low-quality example, generated using contrasting positive and negative prompts. The preference model is then used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022b) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and on both 7B and 30B model scales for preference data simulation.
PDF100December 15, 2024