프롬프팅 없이 사고 사슬 추론하기

초록

대규모 언어 모델(LLMs)의 추론 능력을 향상시키기 위해, 기존 연구는 주로 몇 가지 샷(few-shot) 또는 제로 샷(zero-shot) 사고의 연쇄(CoT) 프롬프팅과 같은 특정 프롬프트 기법에 초점을 맞추어 왔습니다. 이러한 방법들은 효과적이지만, 종종 수작업이 많이 필요한 프롬프트 엔지니어링을 필요로 합니다. 본 연구는 새로운 접근 방식을 취하며 다음과 같은 질문을 던집니다: LLMs가 프롬프팅 없이도 효과적으로 추론할 수 있을까요? 연구 결과에 따르면, 흥미롭게도 사전 훈련된 LLMs에서 단순히 디코딩 과정을 변경함으로써 CoT 추론 경로를 이끌어낼 수 있음이 밝혀졌습니다. 기존의 탐욕적 디코딩(greedy decoding) 대신 상위 k개의 대체 토큰을 조사한 결과, 이러한 시퀀스 내에 CoT 경로가 자주 내재되어 있음을 발견했습니다. 이 접근법은 프롬프팅의 혼란 요인을 피할 뿐만 아니라, LLMs의 본질적인 추론 능력을 평가할 수 있게 해줍니다. 또한, 디코딩 경로에 CoT가 존재할 경우 모델의 디코딩된 답변에 대한 신뢰도가 더 높아지는 상관관계를 관찰했습니다. 이 신뢰도 지표는 CoT 경로와 비 CoT 경로를 효과적으로 구분합니다. 다양한 추론 벤치마크에 대한 광범위한 실증 연구를 통해, 제안된 CoT 디코딩이 표준 탐욕적 디코딩을 크게 능가함을 보여주었습니다.

English

In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding process. Rather than conventional greedy decoding, we investigate the top-k alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' intrinsic reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding substantially outperforms the standard greedy decoding.

프롬프팅 없이 사고 사슬 추론하기

Chain-of-Thought Reasoning Without Prompting

초록

Support