大型語言模型中的內部一致性和自我反饋:一項調查
Internal Consistency and Self-Feedback in Large Language Models: A Survey
July 19, 2024
作者: Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li
cs.AI
摘要
大型語言模型(LLMs)預期能夠準確回應,但常常表現出推理不足或生成幻覺內容。為了應對這些問題,已啟動了以「自我-」為前綴的研究,如自我一致性、自我改進和自我精煉。它們共享一個共同點:涉及LLMs評估並更新自身以減輕問題。然而,這些努力在摘要方面缺乏統一的觀點,因為現有的調查主要集中在分類,而沒有探討這些工作背後的動機。
在本文中,我們總結了一個名為內部一致性的理論框架,該框架為缺乏推理和存在幻覺等現象提供了統一的解釋。內部一致性基於採樣方法評估LLMs的潛在層、解碼層和響應層之間的一致性。在內部一致性框架的基礎上,我們介紹了一個簡化但有效的理論框架,能夠挖掘內部一致性,名為自我反饋。自我反饋框架包括兩個模塊:自我評估和自我更新。這個框架已被應用於許多研究中。
我們通過任務和工作路線系統地將這些研究分類;總結相關的評估方法和基準;並深入探討「自我反饋真的有效嗎?」這一問題。我們提出了幾個關鍵觀點,包括「內部一致性的沙漏演化」、「一致性即(幾乎)正確性」假設和「潛在和顯式推理的悖論」。此外,我們概述了未來研究的有前途的方向。我們已在https://github.com/IAAR-Shanghai/ICSFSurvey開源了實驗代碼、參考文獻列表和統計數據。
English
Large language models (LLMs) are expected to respond accurately but often
exhibit deficient reasoning or generate hallucinatory content. To address
these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve,
and Self-Refine have been initiated. They share a commonality: involving LLMs
evaluating and updating itself to mitigate the issues. Nonetheless, these
efforts lack a unified perspective on summarization, as existing surveys
predominantly focus on categorization without examining the motivations behind
these works.
In this paper, we summarize a theoretical framework, termed Internal
Consistency, which offers unified explanations for phenomena such as the lack
of reasoning and the presence of hallucinations. Internal Consistency assesses
the coherence among LLMs' latent layer, decoding layer, and response layer
based on sampling methodologies. Expanding upon the Internal Consistency
framework, we introduce a streamlined yet effective theoretical framework
capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback
framework consists of two modules: Self-Evaluation and Self-Update. This
framework has been employed in numerous studies.
We systematically classify these studies by tasks and lines of work;
summarize relevant evaluation methods and benchmarks; and delve into the
concern, ``Does Self-Feedback Really Work?'' We propose several critical
viewpoints, including the ``Hourglass Evolution of Internal Consistency'',
``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent
and Explicit Reasoning''. Furthermore, we outline promising directions for
future research. We have open-sourced the experimental code, reference list,
and statistical data, available at
https://github.com/IAAR-Shanghai/ICSFSurvey.Summary
AI-Generated Summary