獎勵增強解碼：使用單向獎勵模型進行高效受控文本生成

摘要

儘管大型語言模型在廣泛的應用中表現出效果，但它們常常生成存在問題或缺乏所需屬性的文字。本文介紹了獎勵增強解碼（RAD），這是一種文字生成程序，使用小型單向獎勵模型來鼓勵語言模型生成具有特定屬性的文字。具體而言，RAD使用獎勵模型對生成進行評分，並重新調整抽樣概率以傾向高獎勵的標記。通過使用單向獎勵模型，RAD可以緩存先前生成步驟的激活，以減少計算開銷。通過對生成非有毒和情感受控文字的實驗，我們證明RAD在僅更改生成程序的方法中表現最佳，並與涉及重新訓練語言模型的最先進方法的性能相匹敵。我們進一步驗證RAD在極大型語言模型上具有有效性，同時帶來最小的計算開銷。

English

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.

獎勵增強解碼：使用單向獎勵模型進行高效受控文本生成

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

摘要

Support