展示,而非告訴:將語言模型與示範反饋對齊
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
June 2, 2024
作者: Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang
cs.AI
摘要
語言模型被調整以模擬眾多聲音的集體,結果產生的輸出與特定個人無關。通過監督微調或RLHF,可以將LLM從通用輸出中引開,但對於新的即興任務,需要極大的數據集。我們認為,可以透過利用極少量(<10)的示範作為反饋,將LLM對齊到特定環境。我們的方法,即示範迭代任務優化(DITTO),直接將語言模型的輸出對齊到用戶示範的行為。DITTO利用在線模仿學習的思想,通過將用戶的示範視為優於LLM及其中間檢查點的輸出,便宜地生成在線比較數據。我們評估DITTO在學習細粒度風格和任務對齊方面的能力,跨越新聞文章、電子郵件和博客文章等領域。此外,我們進行了一項用戶研究,從參與者(N=16)那裡獲取各種示範。在我們的基準測試和用戶研究中,我們發現DITTO的勝率優於少量提示、監督微調和其他自我對弈方法,平均提高了19%。通過直接使用示範作為反饋,DITTO提供了一種有效定制LLM的新方法。
English
Language models are aligned to emulate the collective voice of many,
resulting in outputs that align with no one in particular. Steering LLMs away
from generic output is possible through supervised finetuning or RLHF, but
requires prohibitively large datasets for new ad-hoc tasks. We argue that it is
instead possible to align an LLM to a specific setting by leveraging a very
small number (<10) of demonstrations as feedback. Our method, Demonstration
ITerated Task Optimization (DITTO), directly aligns language model outputs to a
user's demonstrated behaviors. Derived using ideas from online imitation
learning, DITTO cheaply generates online comparison data by treating users'
demonstrations as preferred over output from the LLM and its intermediate
checkpoints. We evaluate DITTO's ability to learn fine-grained style and task
alignment across domains such as news articles, emails, and blog posts.
Additionally, we conduct a user study soliciting a range of demonstrations from
participants (N=16). Across our benchmarks and user study, we find that
win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and
other self-play methods by an average of 19% points. By using demonstrations as
feedback directly, DITTO offers a novel method for effective customization of
LLMs.Summary
AI-Generated Summary