推理步長對大型語言模型的影響

摘要

思維鏈 (CoT) 在提升大型語言模型 (LLMs) 的推理能力方面具有重要意義。然而，CoT 的效果與提示中推理步驟的長度之間的相關性仍然大多未知。為了揭示這一點，我們進行了幾項實證實驗來探索這些關係。具體而言，我們設計了擴展和壓縮 CoT 演示中的合理推理步驟的實驗，同時保持所有其他因素不變。我們得出以下重要發現。首先，結果表明，在提示中延長推理步驟，即使沒有向提示中添加新信息，也顯著增強了LLMs在多個數據集上的推理能力。相反，縮短推理步驟，即使保留了關鍵信息，也顯著降低了模型的推理能力。這一發現凸顯了CoT提示中步驟數的重要性，並提供了實用指導，以更好地利用LLMs在複雜問題解決情境中的潛力。其次，我們還調查了CoT性能與演示中使用的合理性之間的關係。令人驚訝的是，結果顯示，即使是不正確的合理性，如果保持了必要的推理步驟長度，也可以產生良好的結果。第三，我們觀察到增加推理步驟的優勢是任務依賴的：簡單任務需要較少的步驟，而複雜任務則明顯受益於更長的推理序列。

English

Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations, while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences.

推理步長對大型語言模型的影響

The Impact of Reasoning Step Length on Large Language Models

摘要

Support