对比解码在大型语言模型中改善推理
Contrastive Decoding Improves Reasoning in Large Language Models
September 17, 2023
作者: Sean O'Brien, Mike Lewis
cs.AI
摘要
我们展示了对比解码(Contrastive Decoding)——一种由Li等人于2022年提出的简单、计算轻、无需训练的文本生成方法,在各种推理任务上取得了巨大的开箱即用改进。最初被证明可以提高长篇文本生成的感知质量,对比解码搜索最大化强模型和弱模型之间似然差异加权的字符串。我们展示了对比解码使LLaMA-65B在HellaSwag常识推理基准测试中胜过LLaMA 2、GPT-3.5和PaLM 2-L,使其在GSM8K数学词推理基准测试中胜过LLaMA 2、GPT-3.5和PaLM-540B,同时在一系列其他任务上也有改进。分析表明,对比解码通过防止一些抽象推理错误以及避免在思维链中复制输入部分等简单模式,改进了现有方法。总体而言,对比解码在长篇生成方面优于核采样(nucleus sampling),在推理任务上优于贪婪解码,使其成为从语言模型生成文本的强大通用方法。
English
We demonstrate that Contrastive Decoding -- a simple, computationally light,
and training-free text generation method proposed by Li et al 2022 -- achieves
large out-of-the-box improvements over greedy decoding on a variety of
reasoning tasks. Originally shown to improve the perceived quality of long-form
text generation, Contrastive Decoding searches for strings that maximize a
weighted difference in likelihood between strong and weak models. We show that
Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM
2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA
2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in
addition to improvements on a collection of other tasks. Analysis suggests that
Contrastive Decoding improves over existing methods by preventing some abstract
reasoning errors, as well as by avoiding simpler modes such as copying sections
of the input during chain-of-thought. Overall, Contrastive Decoding outperforms
nucleus sampling for long-form generation and greedy decoding for reasoning
tasks, making it a powerful general purpose method for generating text from
language models.