大規模言語モデルにおける内部一貫性と自己フィードバック：サーベイ

要旨

大規模言語モデル（LLM）は正確に応答することが期待されていますが、しばしば推論能力の欠如や幻覚的な内容の生成が見られます。これらの問題に対処するため、「Self-」を接頭辞とする研究、例えばSelf-Consistency（自己一貫性）、Self-Improve（自己改善）、Self-Refine（自己洗練）などが開始されています。これらには共通点があります：LLMが自身を評価し、更新することで問題を軽減するという点です。しかしながら、これらの取り組みには要約に関する統一的な視点が欠けており、既存のサーベイは主に分類に焦点を当てており、これらの研究の背後にある動機を検討していません。本論文では、Internal Consistency（内部一貫性）と呼ばれる理論的フレームワークを要約します。このフレームワークは、推論の欠如や幻覚の存在といった現象に対する統一的な説明を提供します。Internal Consistencyは、サンプリング手法に基づいて、LLMの潜在層、デコード層、および応答層間の一貫性を評価します。Internal Consistencyフレームワークを拡張し、内部一貫性を掘り下げることができる簡潔で効果的な理論的フレームワークであるSelf-Feedback（自己フィードバック）を紹介します。Self-Feedbackフレームワークは、Self-Evaluation（自己評価）とSelf-Update（自己更新）の2つのモジュールで構成されています。このフレームワークは多くの研究で採用されています。これらの研究をタスクと研究の流れに基づいて体系的に分類し、関連する評価方法とベンチマークを要約し、「Self-Feedbackは本当に機能するのか？」という疑問について掘り下げます。私たちは、「内部一貫性の砂時計進化」、「一貫性は（ほぼ）正しさである」という仮説、「潜在的な推論と明示的な推論のパラドックス」など、いくつかの重要な視点を提案します。さらに、将来の研究に向けた有望な方向性を概説します。実験コード、参考文献リスト、および統計データをオープンソースとして公開しており、https://github.com/IAAR-Shanghai/ICSFSurvey で入手可能です。

English

Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, ``Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the ``Hourglass Evolution of Internal Consistency'', ``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at https://github.com/IAAR-Shanghai/ICSFSurvey.

大規模言語モデルにおける内部一貫性と自己フィードバック：サーベイ

Internal Consistency and Self-Feedback in Large Language Models: A Survey

要旨

Support