보강된 보상 디코딩: 단방향 보상 모델을 활용한 효율적인 제어 텍스트 생성

초록

대규모 언어 모델은 다양한 하위 작업에서 효과적인 성능을 입증했지만, 종종 문제가 있거나 원하는 속성을 갖추지 못한 텍스트를 생성하는 경우가 있습니다. 본 논문에서는 특정 속성을 갖춘 텍스트 생성을 장려하기 위해 소규모 단방향 보상 모델을 사용하는 텍스트 생성 절차인 Reward-Augmented Decoding(RAD)를 소개합니다. 구체적으로, RAD는 생성 과정에서 보상 모델을 사용하여 생성된 텍스트를 점수화하고, 높은 보상을 받는 토큰을 선호하도록 샘플링 확률을 재조정합니다. 단방향 보상 모델을 사용함으로써 RAD는 이전 생성 단계의 활성화를 캐시하여 계산 오버헤드를 줄일 수 있습니다. 비독성 및 감정 제어 텍스트 생성 실험을 통해 RAD는 생성 절차만을 변경하는 방법 중에서 최고의 성능을 보이며, 언어 모델을 재학습하는 최신 방법의 성능과도 일치함을 입증합니다. 또한, RAD는 매우 큰 언어 모델에서도 효과적이며 최소한의 계산 오버헤드만 발생함을 추가로 검증합니다.

English

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.

보강된 보상 디코딩: 단방향 보상 모델을 활용한 효율적인 제어 텍스트 생성

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

초록

Support