内部洞察:大语言模型中的隐性事实知识
Inside-Out: Hidden Factual Knowledge in LLMs
March 19, 2025
作者: Zorik Gekhman, Eyal Ben David, Hadas Orgad, Eran Ofek, Yonatan Belinkov, Idan Szpector, Jonathan Herzig, Roi Reichart
cs.AI
摘要
本研究提出了一种评估框架,用于判断大型语言模型(LLMs)在其参数中编码的事实知识是否多于其输出所表达的内容。尽管已有少数研究暗示了这种可能性,但尚未有研究明确定义或证实这一现象。我们首先对知识进行了形式化定义,将其量化为在给定问题下,正确-错误答案对中正确回答排名更高的比例。这引出了外部知识与内部知识的概念,取决于用于评分单个答案候选的信息来源:或是模型可观测的词汇级概率,或是其中间计算过程。当内部知识超过外部知识时,便产生了隐藏知识。随后,我们通过案例研究,在闭卷问答设置下将此框架应用于三个流行的开源权重LLMs。研究结果表明:(1)LLMs内部编码的事实知识持续多于其外部表达,平均差距达40%。(2)令人惊讶的是,某些知识隐藏得如此之深,以至于模型内部可能完全知晓答案,却在大规模重复采样1000次答案的情况下,一次也未能生成该答案。这揭示了LLMs生成能力的根本局限,(3)从而对闭卷问答中通过重复答案采样扩展测试时计算资源提出了实际限制:由于某些答案实际上从未被采样,尽管一旦被采样我们就能确保将其排在首位,但显著的性能提升仍无法实现。
English
This work presents a framework for assessing whether large language models
(LLMs) encode more factual knowledge in their parameters than what they express
in their outputs. While a few studies hint at this possibility, none has
clearly defined or demonstrated this phenomenon. We first propose a formal
definition of knowledge, quantifying it for a given question as the fraction of
correct-incorrect answer pairs where the correct one is ranked higher. This
gives rise to external and internal knowledge, depending on the information
used to score individual answer candidates: either the model's observable
token-level probabilities or its intermediate computations. Hidden knowledge
arises when internal knowledge exceeds external knowledge. We then present a
case study, applying this framework to three popular open-weights LLMs in a
closed-book QA setup. Our results indicate that: (1) LLMs consistently encode
more factual knowledge internally than what they express externally, with an
average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a
model can internally know an answer perfectly, yet fail to generate it even
once, despite large-scale repeated sampling of 1,000 answers. This reveals
fundamental limitations in the generation capabilities of LLMs, which (3) puts
a practical constraint on scaling test-time compute via repeated answer
sampling in closed-book QA: significant performance improvements remain
inaccessible because some answers are practically never sampled, yet if they
were, we would be guaranteed to rank them first.Summary
AI-Generated Summary