推理步长对大型语言模型的影响

摘要

思维链（CoT）对于提升大型语言模型（LLMs）的推理能力具有重要意义。然而，CoT的效果与提示中推理步骤的长度之间的相关性仍然大多未知。为了阐明这一点，我们进行了几项实证实验来探索这种关系。具体而言，我们设计了一些实验，扩展和压缩CoT演示中的推理步骤，同时保持所有其他因素不变。我们得出以下几个关键发现。首先，结果表明，在提示中延长推理步骤，即使没有向提示中添加新信息，也显著增强了LLMs在多个数据集上的推理能力。相反，缩短推理步骤，即使保留了关键信息，也显著降低了模型的推理能力。这一发现突显了CoT提示中步骤数量的重要性，并为更好地利用LLMs在复杂问题解决场景中的潜力提供了实用指导。其次，我们还调查了CoT性能与演示中使用的推理之间的关系。令人惊讶的是，结果显示，即使是错误的推理，如果保持了必要的推理长度，也可以产生良好的结果。第三，我们观察到增加推理步骤的优势是依赖于任务的：简单任务需要较少的步骤，而复杂任务则从更长的推理序列中获益显著。

English

Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations, while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences.

推理步长对大型语言模型的影响

The Impact of Reasoning Step Length on Large Language Models

摘要

Support