ChatPaper.aiChatPaper

RLAIF:通過人類反饋將強化學習擴展到人工智能領域 反饋

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

September 1, 2023
作者: Harrison Lee, Samrat Phatale, Hassan Mansoor, Kellie Lu, Thomas Mesnard, Colton Bishop, Victor Carbune, Abhinav Rastogi
cs.AI

摘要

從人類反饋中學習的強化學習(RLHF)對於使大型語言模型(LLMs)與人類偏好保持一致是有效的,但收集高質量的人類偏好標籤是一個關鍵瓶頸。我們對RLHF與從人工智能反饋(RLAIF)學習的強化學習進行了一次直接比較 - RLAIF是一種技術,其中偏好由現成的LLM標記,而非人類,我們發現它們帶來了類似的改進。在摘要任務中,人類評估者在約70%的情況下更喜歡RLAIF和RLHF生成的結果,而不是基於監督微調模型的基準。此外,當被要求對RLAIF與RLHF的摘要進行評分時,人類以相同比率偏好兩者。這些結果表明,RLAIF可以產生人類級別的性能,為RLHF的可擴展性限制提供了潛在解決方案。
English
Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LLMs) to human preferences, but gathering high quality human preference labels is a key bottleneck. We conduct a head-to-head comparison of RLHF vs. RL from AI Feedback (RLAIF) - a technique where preferences are labeled by an off-the-shelf LLM in lieu of humans, and we find that they result in similar improvements. On the task of summarization, human evaluators prefer generations from both RLAIF and RLHF over a baseline supervised fine-tuned model in ~70% of cases. Furthermore, when asked to rate RLAIF vs. RLHF summaries, humans prefer both at equal rates. These results suggest that RLAIF can yield human-level performance, offering a potential solution to the scalability limitations of RLHF.

Summary

AI-Generated Summary

PDF501December 15, 2024