단일 토큰도 생성하지 않고 대형 언어 모델의 지식을 추정하기

초록

대규모 언어 모델(LLM)의 지식을 평가하기 위해, 현재의 방법들은 모델에 질의를 하고 생성된 응답을 평가합니다. 본 연구에서는 모델이 텍스트를 생성하기 전에 평가를 수행할 수 있는지에 대해 질문합니다. 구체적으로, 모델의 내부 계산만으로 특정 엔티티에 대한 모델의 지식 수준을 추정할 수 있는지 알아보고자 합니다. 우리는 이 질문을 두 가지 과제로 연구합니다: 주어진 주체 엔티티에 대해, (a) 모델이 해당 엔티티에 대한 일반적인 질문에 답변할 수 있는 능력을 예측하고, (b) 모델이 해당 엔티티에 대해 생성한 응답의 사실성을 예측하는 것입니다. 다양한 LLM을 대상으로 한 실험에서, 내부 주체 표현에 대해 학습된 간단한 프로브인 KEEN이 두 과제 모두에서 성공적으로 작동함을 보여줍니다. 이는 주체별 모델의 질의응답 정확도와 최근에 제안된 개방형 생성에서의 사실성 지표인 FActScore와 강한 상관관계를 보입니다. 또한, KEEN은 모델의 회피적 행동과 자연스럽게 일치하며, 미세 조정 후 모델의 지식 변화를 충실히 반영합니다. 마지막으로, 우리는 더 해석 가능하면서도 동등한 성능을 보이는 KEEN의 변형을 제시하며, 이는 모델의 지식 부족과 상관관계가 있는 소수의 토큰 집합을 강조합니다. 간단하고 경량화된 KEEN은 LLM의 엔티티 지식 간극과 군집을 식별하고, 검색을 통해 질의를 보강하는 등의 결정을 안내하는 데 활용될 수 있습니다.

English

To evaluate knowledge in large language models (LLMs), current methods query the model and then evaluate its generated responses. In this work, we ask whether evaluation can be done before the model has generated any text. Concretely, is it possible to estimate how knowledgeable a model is about a certain entity, only from its internal computation? We study this question with two tasks: given a subject entity, the goal is to predict (a) the ability of the model to answer common questions about the entity, and (b) the factuality of responses generated by the model about the entity. Experiments with a variety of LLMs show that KEEN, a simple probe trained over internal subject representations, succeeds at both tasks - strongly correlating with both the QA accuracy of the model per-subject and FActScore, a recent factuality metric in open-ended generation. Moreover, KEEN naturally aligns with the model's hedging behavior and faithfully reflects changes in the model's knowledge after fine-tuning. Lastly, we show a more interpretable yet equally performant variant of KEEN, which highlights a small set of tokens that correlates with the model's lack of knowledge. Being simple and lightweight, KEEN can be leveraged to identify gaps and clusters of entity knowledge in LLMs, and guide decisions such as augmenting queries with retrieval.

단일 토큰도 생성하지 않고 대형 언어 모델의 지식을 추정하기

Estimating Knowledge in Large Language Models Without Generating a Single Token

초록

Support