대규모 언어 모델에서 추론 단계 길이가 미치는 영향

초록

사고의 연쇄(Chain of Thought, CoT)는 대형 언어 모델(LLMs)의 추론 능력을 향상시키는 데 중요한 역할을 합니다. 그러나 CoT의 효과와 프롬프트 내 추론 단계의 길이 간의 상관관계는 여전히 대부분 알려져 있지 않습니다. 이를 밝히기 위해, 우리는 이러한 관계를 탐구하기 위해 여러 실험을 수행했습니다. 구체적으로, 우리는 CoT 데모 내의 논리적 추론 단계를 확장하고 압축하는 실험을 설계했으며, 이때 다른 모든 요소는 동일하게 유지했습니다. 우리는 다음과 같은 주요 발견을 했습니다. 첫째, 프롬프트 내 추론 단계를 길게 하는 것이, 새로운 정보를 추가하지 않더라도, 여러 데이터셋에서 LLMs의 추론 능력을 상당히 향상시킨다는 것을 보여줍니다. 반대로, 추론 단계를 짧게 하면, 핵심 정보를 보존하더라도 모델의 추론 능력이 크게 저하됩니다. 이 발견은 CoT 프롬프트 내 단계 수의 중요성을 강조하며, 복잡한 문제 해결 시나리오에서 LLMs의 잠재력을 더 잘 활용하기 위한 실질적인 지침을 제공합니다. 둘째, 우리는 CoT의 성능과 데모에서 사용된 논리적 근거 간의 관계도 조사했습니다. 놀랍게도, 결과는 잘못된 논리적 근거라도 필요한 추론 길이를 유지한다면 유리한 결과를 낼 수 있다는 것을 보여줍니다. 셋째, 추론 단계를 늘리는 이점은 작업에 따라 다르다는 것을 관찰했습니다: 단순한 작업은 더 적은 단계를 필요로 하는 반면, 복잡한 작업은 더 긴 추론 시퀀스에서 상당한 이점을 얻습니다.

English

Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations, while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences.

대규모 언어 모델에서 추론 단계 길이가 미치는 영향

The Impact of Reasoning Step Length on Large Language Models

초록

Support