由內而外：大型語言模型中的隱含事實知識

摘要

本研究提出了一個框架，用於評估大型語言模型（LLMs）在其參數中編碼的事實知識是否多於它們在輸出中所表達的內容。雖然一些研究暗示了這種可能性，但尚未有研究明確界定或證實這一現象。我們首先提出了一個知識的正式定義，將其量化為對於給定問題，正確答案在正確-錯誤答案對中被排名更高的比例。這引出了外部知識和內部知識的概念，取決於用於評分個別答案候選者的信息來源：無論是模型可觀察到的詞元級概率，還是其中間計算結果。當內部知識超過外部知識時，便產生了隱藏知識。隨後，我們通過一個案例研究，在閉卷問答設置中將此框架應用於三個流行的開源權重LLMs。我們的結果表明：（1）LLMs在內部編碼的事實知識始終多於它們在外部表達的內容，平均差距達40%。（2）令人驚訝的是，某些知識隱藏得如此之深，以至於模型在內部完美知曉一個答案，卻在進行大規模重複採樣（1,000次答案生成）時，一次也未能生成該答案。這揭示了LLMs生成能力的根本限制，（3）這對通過在閉卷問答中重複答案採樣來擴展測試時計算資源的實際應用構成了約束：由於某些答案實際上從未被採樣到，儘管如果它們被採樣到，我們保證會將其排名第一，但顯著的性能提升仍然無法實現。

English

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model's observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) puts a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first.

由內而外：大型語言模型中的隱含事實知識

Inside-Out: Hidden Factual Knowledge in LLMs

摘要

Support