체인 오브 사고(Chain-of-Thought) 추론에서의 신뢰도 측정

초록

대규모 언어 모델(LLM)은 질문에 답하기 전에 단계별 "사고의 연쇄(Chain-of-Thought, CoT)" 추론을 생성할 때 더 나은 성능을 보이지만, 이러한 추론이 모델의 실제 추론 과정(즉, 질문에 답하는 과정)을 충실히 설명하는지는 명확하지 않습니다. 우리는 CoT 추론이 어떻게 불충실할 수 있는지에 대한 가설을 조사하기 위해, CoT에 개입(예: 실수를 추가하거나 재구성)할 때 모델 예측이 어떻게 변화하는지 살펴봅니다. 모델은 답변을 예측할 때 CoT에 얼마나 강하게 의존하는지 작업에 따라 큰 차이를 보이며, 때로는 CoT에 크게 의존하기도 하고 다른 경우에는 주로 무시하기도 합니다. CoT의 성능 향상은 CoT가 추가한 테스트 시간 계산량만으로 설명되거나 CoT의 특정 문구를 통해 인코딩된 정보로 설명되지 않는 것으로 보입니다. 모델이 더 크고 능력이 향상될수록, 우리가 연구한 대부분의 작업에서 덜 충실한 추론을 생성합니다. 전반적으로, 우리의 결과는 모델 크기와 작업과 같은 조건을 신중하게 선택할 경우 CoT가 충실할 수 있음을 시사합니다.

English

Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

체인 오브 사고(Chain-of-Thought) 추론에서의 신뢰도 측정

Measuring Faithfulness in Chain-of-Thought Reasoning

초록

Support