對比解碼提升大型語言模型的推理能力

摘要

我們展示了對比解碼（Contrastive Decoding）──一種由Li等人在2022年提出的簡單、計算輕、無需訓練的文本生成方法──在各種推理任務中實現了大幅的開箱即用改進，優於貪婪解碼。最初被證明可提升長文本生成的感知質量，對比解碼尋找最大化強模型和弱模型之間可能性差異的字串。我們展示對比解碼使LLaMA-65B在HellaSwag常識推理基準測試中超越了LLaMA 2、GPT-3.5和PaLM 2-L，並在GSM8K數學詞推理基準測試中超越了LLaMA 2、GPT-3.5和PaLM-540B，同時在一系列其他任務上也有改進。分析表明，對比解碼通過防止某些抽象推理錯誤以及避免在思維鏈中複製輸入的部分等簡單模式，改進了現有方法。總的來說，對比解碼優於核心抽樣（nucleus sampling）用於長文本生成，優於貪婪解碼用於推理任務，使其成為從語言模型生成文本的強大通用方法。

English

We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.

對比解碼提升大型語言模型的推理能力

Contrastive Decoding Improves Reasoning in Large Language Models

摘要

Support