無提示的思維連鎖推理

Chain-of-Thought Reasoning Without Prompting

February 15, 2024

作者: Xuezhi Wang, Denny Zhou

cs.AI

摘要

為了增強大型語言模型（LLMs）的推理能力，先前的研究主要集中在特定提示技術，如少樣本或零樣本的思維鏈提示（CoT）。這些方法雖然有效，但通常需要大量手動提示工程。我們的研究採用了一種新方法，提出了一個問題：LLMs是否可以在沒有提示的情況下有效地進行推理？我們的研究發現，有趣的是，通過簡單地改變解碼過程，可以從預訓練的LLMs中引出CoT推理路徑。我們不再使用傳統的貪婪解碼，而是研究了前k個替代標記，發現這些序列中經常存在CoT路徑。這種方法不僅可以避開提示的混淆因素，還可以讓我們評估LLMs的內在推理能力。此外，我們觀察到，在解碼路徑中存在CoT與模型解碼答案的信心之間存在較高的相關性。這種信心指標有效地區分了CoT和非CoT路徑。對各種推理基準的廣泛實證研究表明，所提出的CoT解碼明顯優於標準的貪婪解碼。

English

In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding process. Rather than conventional greedy decoding, we investigate the top-k alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' intrinsic reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding substantially outperforms the standard greedy decoding.