SelfCheck: LLM을 활용한 단계별 추론 과정의 제로샷 자가 점검

초록

최근 대형 언어 모델(LLMs)의 발전, 특히 사고 사슬(CoT) 프롬프팅의 발명은 추론 문제를 해결할 수 있는 가능성을 열었습니다. 그러나 가장 강력한 LLMs조차도 비선형적 사고와 다단계 추론이 필요한 더 복잡한 문제에는 여전히 어려움을 겪고 있습니다. 본 연구에서는 LLMs가 외부 자원에 의존하지 않고 스스로의 오류를 인식할 수 있는 능력을 탐구합니다. 특히, 단계별 추론 과정에서 개별 오류를 식별하는 데 사용될 수 있는지 조사합니다. 이를 위해, 우리는 이러한 오류를 인식하기 위한 제로샷 검증 기법을 제안합니다. 그리고 이 검증 기법을 사용하여 생성된 다양한 답변에 대해 가중 투표를 수행함으로써 질문-응답 성능을 개선합니다. 이 방법을 GSM8K, MathQA, MATH라는 세 가지 수학 데이터셋에서 테스트한 결과, 이 방법이 오류를 성공적으로 인식하고 궁극적으로 예측 성능을 향상시키는 것을 확인했습니다.

English

The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.

SelfCheck: LLM을 활용한 단계별 추론 과정의 제로샷 자가 점검

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

초록

Support