見せる、語らない：デモンストレーションによるフィードバックを用いた言語モデルのアライメント

要旨

言語モデルは、多くの人々の集合的な声を模倣するように調整されており、特定の誰かと一致する出力を生成するわけではありません。LLMを一般的な出力から逸脱させることは、教師ありファインチューニングやRLHF（人間によるフィードバックを用いた強化学習）を通じて可能ですが、新しいアドホックなタスクに対しては膨大なデータセットが必要となり、現実的ではありません。私たちは、代わりに、非常に少数（10未満）のデモンストレーションをフィードバックとして活用することで、LLMを特定の設定に合わせて調整することが可能であると主張します。私たちの手法である「デモンストレーション反復タスク最適化（DITTO）」は、言語モデルの出力をユーザーのデモンストレーションされた行動に直接合わせます。DITTOは、オンライン模倣学習のアイデアを基に、ユーザーのデモンストレーションをLLMやその中間チェックポイントからの出力よりも優先されるものとして扱うことで、低コストでオンライン比較データを生成します。私たちは、DITTOがニュース記事、メール、ブログ投稿などのドメインにわたって、細かいスタイルやタスクの調整を学習する能力を評価します。さらに、参加者（N=16）からさまざまなデモンストレーションを収集するユーザー調査を実施しました。ベンチマークとユーザー調査の結果、DITTOの勝率は、Few-shotプロンプティング、教師ありファインチューニング、および他の自己プレイ手法を平均19％ポイント上回ることがわかりました。デモンストレーションを直接フィードバックとして使用することで、DITTOはLLMの効果的なカスタマイズのための新しい方法を提供します。

English

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (<10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N=16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.

見せる、語らない：デモンストレーションによるフィードバックを用いた言語モデルのアライメント

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

要旨

Support