報酬拡張デコーディング：単方向報酬モデルを用いた効率的な制御テキスト生成

要旨

大規模言語モデルは、多岐にわたる下流タスクにおいて有効性が証明されているものの、しばしば問題のあるテキストや望ましい属性を欠いたテキストを生成することがある。本論文では、Reward-Augmented Decoding（RAD）というテキスト生成手法を提案する。RADは、小さな単方向報酬モデルを用いて、言語モデルが特定の特性を持つテキストを生成するよう促す。具体的には、RADは生成されたテキストを報酬モデルで評価し、サンプリング確率を再スケーリングして高報酬のトークンを優先する。単方向報酬モデルを使用することで、RADは前の生成ステップからの活性化をキャッシュし、計算オーバーヘッドを削減することができる。非毒性テキストや感情制御テキストの生成に関する実験を通じて、RADが生成手順のみを変更する手法の中で最も優れた性能を発揮し、言語モデルの再学習を伴う最先端の手法と同等の性能を達成することを示す。さらに、RADが非常に大規模な言語モデルにおいても有効であり、最小限の計算オーバーヘッドで機能することを検証する。

English

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.

報酬拡張デコーディング：単方向報酬モデルを用いた効率的な制御テキスト生成

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

要旨

Support