대조적 디코딩은 대규모 언어 모델의 추론 능력을 향상시킵니다.

초록

우리는 Li 등(2022)이 제안한 간단하고 계산 부담이 적으며 추가 학습이 필요 없는 텍스트 생성 방법인 Contrastive Decoding이 다양한 추론 과제에서 탐욕적 디코딩(greedy decoding) 대비 큰 개선 효과를 즉시 달성함을 보여준다. 원래 장문 텍스트 생성의 질적 향상을 위해 제안된 Contrastive Decoding은 강력한 모델과 약한 모델 간의 가능성 차이를 가중치로 최대화하는 문자열을 탐색한다. 우리는 Contrastive Decoding이 LLaMA-65B로 하여금 HellaSwag 상식 추론 벤치마크에서 LLaMA 2, GPT-3.5 및 PaLM 2-L을 능가하고, GSM8K 수학 단어 문제 추론 벤치마크에서 LLaMA 2, GPT-3.5 및 PaLM-540B를 능가하며, 다른 과제들에서도 개선을 달성함을 보여준다. 분석 결과, Contrastive Decoding은 일부 추상적 추론 오류를 방지하고, 사고의 연쇄(chain-of-thought) 과정에서 입력의 일부를 단순히 복사하는 등의 단순한 모드를 피함으로써 기존 방법들을 개선하는 것으로 나타났다. 전반적으로, Contrastive Decoding은 장문 생성에서는 nucleus sampling을, 추론 과제에서는 탐욕적 디코딩을 능가하여, 언어 모델로부터 텍스트를 생성하는 강력한 범용 방법으로 자리매김하고 있다.

English

We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.

대조적 디코딩은 대규모 언어 모델의 추론 능력을 향상시킵니다.

Contrastive Decoding Improves Reasoning in Large Language Models

초록

Support