在大型語言模型中估計知識,而無需生成單個標記。
Estimating Knowledge in Large Language Models Without Generating a Single Token
June 18, 2024
作者: Daniela Gottesman, Mor Geva
cs.AI
摘要
為了評估大型語言模型(LLMs)中的知識,目前的方法是查詢模型,然後評估其生成的回應。在這項工作中,我們詢問是否可以在模型生成任何文本之前進行評估。具體而言,是否可能僅從其內部計算來估計模型對特定實體的知識水平?我們通過兩個任務來研究這個問題:給定一個主題實體,目標是預測(a)模型回答有關該實體的常見問題的能力,以及(b)模型生成的有關該實體的回應的事實性。通過對各種LLMs進行實驗,結果顯示,一個名為KEEN的簡單探針,在內部主題表示上訓練成功地完成了這兩個任務 - 與模型每個主題的問答準確性和最近在開放式生成中的事實性度量FActScore有著很強的相關性。此外,KEEN自然地與模型的避險行為相一致,並忠實地反映了在微調後模型知識的變化。最後,我們展示了一個更具可解釋性但同樣表現出色的KEEN變體,該變體突出了一小組與模型缺乏知識相關的標記。由於其簡單且輕量,KEEN可用於識別LLMs中實體知識的空白和聚類,並指導決策,例如通過擴充查詢與檢索。
English
To evaluate knowledge in large language models (LLMs), current methods query
the model and then evaluate its generated responses. In this work, we ask
whether evaluation can be done before the model has generated any
text. Concretely, is it possible to estimate how knowledgeable a model is about
a certain entity, only from its internal computation? We study this question
with two tasks: given a subject entity, the goal is to predict (a) the ability
of the model to answer common questions about the entity, and (b) the
factuality of responses generated by the model about the entity. Experiments
with a variety of LLMs show that KEEN, a simple probe trained over internal
subject representations, succeeds at both tasks - strongly correlating with
both the QA accuracy of the model per-subject and FActScore, a recent
factuality metric in open-ended generation. Moreover, KEEN naturally aligns
with the model's hedging behavior and faithfully reflects changes in the
model's knowledge after fine-tuning. Lastly, we show a more interpretable yet
equally performant variant of KEEN, which highlights a small set of tokens that
correlates with the model's lack of knowledge. Being simple and lightweight,
KEEN can be leveraged to identify gaps and clusters of entity knowledge in
LLMs, and guide decisions such as augmenting queries with retrieval.Summary
AI-Generated Summary