대규모 언어 모델의 내부 일관성과 자기 피드백: 연구 동향

초록

대형 언어 모델(LLMs)은 정확한 응답을 제공할 것으로 기대되지만, 종종 결함 있는 추론을 보이거나 허구적인 내용을 생성하는 경우가 있습니다. 이러한 문제를 해결하기 위해 Self-Consistency, Self-Improve, Self-Refine 등 'Self-'로 시작하는 연구들이 시작되었습니다. 이들은 공통적으로 LLMs가 스스로를 평가하고 업데이트하여 문제를 완화하는 방식을 포함하고 있습니다. 그러나 이러한 노력들은 종합적인 관점에서 요약이 부족하며, 기존의 연구들은 주로 분류에 초점을 맞추고 이러한 작업들의 동기를 심층적으로 검토하지 않았습니다. 본 논문에서는 '내부 일관성(Internal Consistency)'이라는 이론적 프레임워크를 요약합니다. 이 프레임워크는 추론 부족과 허구적 내용 생성과 같은 현상에 대한 통합적인 설명을 제공합니다. 내부 일관성은 샘플링 방법론을 기반으로 LLMs의 잠재층, 디코딩층, 응답층 간의 일관성을 평가합니다. 내부 일관성 프레임워크를 확장하여, 우리는 내부 일관성을 탐구할 수 있는 간결하면서도 효과적인 이론적 프레임워크인 Self-Feedback을 소개합니다. Self-Feedback 프레임워크는 Self-Evaluation과 Self-Update 두 모듈로 구성되며, 이 프레임워크는 다양한 연구에서 활용되었습니다. 우리는 이러한 연구들을 작업 및 연구 라인별로 체계적으로 분류하고, 관련 평가 방법과 벤치마크를 요약하며, "Self-Feedback이 정말 효과가 있는가?"라는 질문에 대해 심층적으로 탐구합니다. 우리는 '내부 일관성의 모래시계 진화(Hourglass Evolution of Internal Consistency)', '일관성은 (거의) 정확성이다(Consistency Is (Almost) Correctness)' 가설, '잠재적 및 명시적 추론의 역설(The Paradox of Latent and Explicit Reasoning)' 등 여러 중요한 관점을 제안합니다. 또한, 미래 연구를 위한 유망한 방향을 제시합니다. 우리는 실험 코드, 참고 문헌 목록, 통계 데이터를 오픈소스로 공개하였으며, 이는 https://github.com/IAAR-Shanghai/ICSFSurvey에서 확인할 수 있습니다.

English

Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, ``Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the ``Hourglass Evolution of Internal Consistency'', ``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at https://github.com/IAAR-Shanghai/ICSFSurvey.

대규모 언어 모델의 내부 일관성과 자기 피드백: 연구 동향

Internal Consistency and Self-Feedback in Large Language Models: A Survey

초록

Support