SLiC-HF:利用人類反饋進行序列概率校準
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
May 17, 2023
作者: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu
cs.AI
摘要
已經證明從人類反饋中學習對於使語言模型與人類偏好保持一致是有效的。過去的研究通常依賴於從人類反饋中進行強化學習(RLHF),該方法使用從人類偏好數據訓練的獎勵模型分配的獎勵分數來優化語言模型。在這項研究中,我們展示了最近引入的序列可能性校準(SLiC)也可以用於有效地從人類偏好中學習(SLiC-HF)。此外,我們展示了這可以使用為不同模型收集的人類反饋數據來實現,類似於離線強化學習數據。在TL;DR摘要任務上進行的自動和人類評估實驗表明,SLiC-HF顯著改善了監督微調基線。此外,SLiC-HF提供了一個競爭性的替代方案,比過去工作中使用的PPO RLHF實現更簡單,更容易調整,在實踐中更具計算效率。
English
Learning from human feedback has been shown to be effective at aligning
language models with human preferences. Past work has often relied on
Reinforcement Learning from Human Feedback (RLHF), which optimizes the language
model using reward scores assigned from a reward model trained on human
preference data. In this work we show how the recently introduced Sequence
Likelihood Calibration (SLiC), can also be used to effectively learn from human
preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human
feedback data collected for a different model, similar to off-policy, offline
RL data. Automatic and human evaluation experiments on the TL;DR summarization
task show that SLiC-HF significantly improves supervised fine-tuning baselines.
Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF
implementation used in past work while being much simpler to implement, easier
to tune and more computationally efficient in practice.