奖励增强解码：使用单向奖励模型实现高效受控文本生成

摘要

尽管大型语言模型在广泛的下游应用中证明了其有效性，但它们经常生成存在问题或缺乏所需属性的文本。本文介绍了一种名为奖励增强解码（RAD）的文本生成过程，该过程使用一个小型单向奖励模型来鼓励语言模型生成具有特定属性的文本。具体而言，RAD使用奖励模型对生成的文本进行评分，并重新调整抽样概率以偏好高奖励标记。通过使用单向奖励模型，RAD可以缓存先前生成步骤的激活，以减少计算开销。通过在生成无毒和情感受控文本方面的实验，我们证明RAD在仅更改生成过程的方法中表现最佳，并且与涉及重新训练语言模型的最先进方法的性能相匹配。我们进一步验证RAD在非常大的语言模型上是有效的，同时带来了极小的计算开销。

English

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.

奖励增强解码：使用单向奖励模型实现高效受控文本生成

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

摘要

Support