대형 언어 모델은 아직 자체적으로 추론을 수정할 수 없습니다.

초록

대규모 언어 모델(LLMs)은 다양한 응용 분야에서 뛰어난 텍스트 생성 능력을 바탕으로 혁신적인 기술로 부상했습니다. 그러나 생성된 콘텐츠의 정확성과 적절성에 대한 우려는 여전히 남아 있습니다. 이러한 문제를 해결하기 위해 최근에는 자기 수정(self-correction)이라는 방법론이 제안되었습니다. 본 논문은 이러한 전제를 바탕으로 LLMs 내에서 자기 수정의 역할과 효용성을 비판적으로 검토하며, 그 진정한 잠재력과 한계를 밝히고자 합니다. 우리의 연구에서 중점적으로 다루는 것은 내재적 자기 수정(intrinsic self-correction)의 개념으로, 이는 외부 피드백 없이 LLM이 자체 능력만을 바탕으로 초기 응답을 수정하려는 시도를 의미합니다. 특히 추론(reasoning) 맥락에서, 우리의 연구는 LLMs가 외부 피드백 없이 응답을 자기 수정하는 데 어려움을 겪으며, 경우에 따라 자기 수정 후 성능이 오히려 저하될 수 있음을 보여줍니다. 이러한 통찰을 바탕으로, 우리는 이 분야의 향후 연구와 실용적 응용을 위한 제안을 제시합니다.

English

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance might even degrade post self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.

대형 언어 모델은 아직 자체적으로 추론을 수정할 수 없습니다.

Large Language Models Cannot Self-Correct Reasoning Yet

초록

Support