언어 모델에서의 지식 편집이 미치는 파급 효과 평가

초록

현대 언어 모델은 방대한 양의 사실적 지식을 포착합니다. 그러나 일부 사실은 잘못 유도되거나 시간이 지남에 따라 구식이 되어 사실적으로 부정확한 생성물을 초래할 수 있습니다. 이로 인해 모델에 인코딩된 사실을 업데이트할 수 있는 다양한 편집 방법이 개발되었습니다. 이러한 방법의 평가는 주로 개별 사실이 성공적으로 주입되었는지, 그리고 다른 주제에 대한 유사한 예측이 변경되지 않았는지를 테스트하는 데 초점을 맞추어 왔습니다. 여기서 우리는 이러한 평가가 제한적이라고 주장합니다. 왜냐하면 하나의 사실(예: "잭 뎁은 조니 뎁의 아들이다")을 주입하면 모델이 업데이트해야 하는 추가적인 사실(예: "잭 뎁은 릴리로즈 뎁의 형제이다")과 같은 "파급 효과"가 발생하기 때문입니다. 이 문제를 해결하기 위해, 우리는 편집이 관련 사실에 미치는 영향을 고려한 새로운 평가 기준 세트를 제안합니다. 이러한 기준을 사용하여, 우리는 다양한 유형의 파급 효과를 포착하는 5,000개의 사실 편집으로 구성된 진단 벤치마크를 구축합니다. 우리는 이 벤치마크에서 주요 편집 방법을 평가하며, 현재의 방법들이 모델의 지식에 일관된 변화를 도입하는 데 실패하고 있음을 보여줍니다. 또한, 우리는 간단한 인컨텍스트 편집 베이스라인이 우리의 벤치마크에서 가장 높은 점수를 얻는 것을 발견하여, 모델 편집을 위한 유망한 연구 방향을 제시합니다.

English

Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct , a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on , showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.

언어 모델에서의 지식 편집이 미치는 파급 효과 평가

Evaluating the Ripple Effects of Knowledge Editing in Language Models

초록

Support