大型语言模型中的内部一致性和自我反馈:一项调查
Internal Consistency and Self-Feedback in Large Language Models: A Survey
July 19, 2024
作者: Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li
cs.AI
摘要
大型语言模型(LLMs)被期望能够准确地回应,但往往表现出推理不足或生成幻觉内容。为了解决这些问题,已经启动了以“自我-”为前缀的研究,如自我一致性、自我改进和自我完善。它们有一个共同点:涉及LLMs评估和更新自身以减轻问题。然而,这些努力缺乏对总结的统一视角,因为现有调查主要侧重于分类,而没有审视这些工作背后的动机。
在本文中,我们总结了一个名为内部一致性的理论框架,为缺乏推理和存在幻觉等现象提供了统一的解释。内部一致性根据采样方法评估LLMs的潜在层、解码层和响应层之间的一致性。在内部一致性框架的基础上,我们引入了一个简化而有效的理论框架,能够挖掘内部一致性,名为自我反馈。自我反馈框架包括两个模块:自我评估和自我更新。这个框架已被许多研究采用。
我们通过任务和工作线路系统地对这些研究进行分类;总结相关的评估方法和基准;深入探讨“自我反馈真的有效吗?”这一问题。我们提出了几个关键观点,包括“内部一致性的沙漏演化”、“一致性即(几乎)正确性”假设和“潜在和显式推理的悖论”。此外,我们概述了未来研究的有前途的方向。我们已在https://github.com/IAAR-Shanghai/ICSFSurvey开源了实验代码、参考文献列表和统计数据。
English
Large language models (LLMs) are expected to respond accurately but often
exhibit deficient reasoning or generate hallucinatory content. To address
these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve,
and Self-Refine have been initiated. They share a commonality: involving LLMs
evaluating and updating itself to mitigate the issues. Nonetheless, these
efforts lack a unified perspective on summarization, as existing surveys
predominantly focus on categorization without examining the motivations behind
these works.
In this paper, we summarize a theoretical framework, termed Internal
Consistency, which offers unified explanations for phenomena such as the lack
of reasoning and the presence of hallucinations. Internal Consistency assesses
the coherence among LLMs' latent layer, decoding layer, and response layer
based on sampling methodologies. Expanding upon the Internal Consistency
framework, we introduce a streamlined yet effective theoretical framework
capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback
framework consists of two modules: Self-Evaluation and Self-Update. This
framework has been employed in numerous studies.
We systematically classify these studies by tasks and lines of work;
summarize relevant evaluation methods and benchmarks; and delve into the
concern, ``Does Self-Feedback Really Work?'' We propose several critical
viewpoints, including the ``Hourglass Evolution of Internal Consistency'',
``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent
and Explicit Reasoning''. Furthermore, we outline promising directions for
future research. We have open-sourced the experimental code, reference list,
and statistical data, available at
https://github.com/IAAR-Shanghai/ICSFSurvey.Summary
AI-Generated Summary