在大型语言模型中估计知识,无需生成单个标记。
Estimating Knowledge in Large Language Models Without Generating a Single Token
June 18, 2024
作者: Daniela Gottesman, Mor Geva
cs.AI
摘要
为了评估大型语言模型(LLMs)中的知识,当前的方法是查询模型,然后评估其生成的响应。在这项工作中,我们探讨是否可以在模型生成任何文本之前进行评估。具体地说,是否可以仅通过其内部计算来估计模型对特定实体的了解程度?我们通过两项任务来研究这个问题:给定一个主题实体,目标是预测(a)模型回答关于该实体的常见问题的能力,以及(b)模型生成关于该实体的响应的事实性。对各种LLMs进行的实验表明,KEEN,一个简单的探针,通过内部主题表示进行训练,在这两个任务上取得成功 - 与模型每个主题的问答准确性和最近的开放式生成事实度指标FActScore有很强的相关性。此外,KEEN自然地与模型的避重就轻行为相一致,并忠实地反映了在微调后模型知识的变化。最后,我们展示了一个更具可解释性但同样表现出色的KEEN变体,它突出显示了一小组标记,这些标记与模型的缺乏知识相关。由于简单且轻量,KEEN可用于识别LLMs中实体知识的空白和聚类,并指导决策,例如通过检索来增强查询。
English
To evaluate knowledge in large language models (LLMs), current methods query
the model and then evaluate its generated responses. In this work, we ask
whether evaluation can be done before the model has generated any
text. Concretely, is it possible to estimate how knowledgeable a model is about
a certain entity, only from its internal computation? We study this question
with two tasks: given a subject entity, the goal is to predict (a) the ability
of the model to answer common questions about the entity, and (b) the
factuality of responses generated by the model about the entity. Experiments
with a variety of LLMs show that KEEN, a simple probe trained over internal
subject representations, succeeds at both tasks - strongly correlating with
both the QA accuracy of the model per-subject and FActScore, a recent
factuality metric in open-ended generation. Moreover, KEEN naturally aligns
with the model's hedging behavior and faithfully reflects changes in the
model's knowledge after fine-tuning. Lastly, we show a more interpretable yet
equally performant variant of KEEN, which highlights a small set of tokens that
correlates with the model's lack of knowledge. Being simple and lightweight,
KEEN can be leveraged to identify gaps and clusters of entity knowledge in
LLMs, and guide decisions such as augmenting queries with retrieval.Summary
AI-Generated Summary