評估語言模型中知識編輯的漣漪效應
Evaluating the Ripple Effects of Knowledge Editing in Language Models
July 24, 2023
作者: Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva
cs.AI
摘要
現代語言模型擁有豐富的事實知識。然而,某些事實可能被錯誤歸納或隨時間而過時,導致事實上的生成錯誤。這促使各種編輯方法的發展,允許更新模型編碼的事實。對這些方法的評估主要集中在測試單個事實是否成功注入,以及其他主題的類似預測是否未發生變化。在這裡,我們認為這種評估是有限的,因為注入一個事實(例如“Jack Depp 是 Johnny Depp 的兒子”)會引入“漣漪效應”,即模型需要更新的其他事實(例如“Jack Depp 是 Lily-Rose Depp 的兄弟”)。為了解決這個問題,我們提出了一套新的評估標準,考慮了編輯對相關事實的影響。利用這些標準,我們構建了一個包含 5K 個事實編輯的診斷基準,捕捉各種漣漪效應的類型。我們對知名的編輯方法進行評估,顯示目前的方法未能在模型知識中引入一致的變化。此外,我們發現一個簡單的上下文編輯基線在我們的基準測試中獲得最佳分數,表明模型編輯的一個有前途的研究方向。
English
Modern language models capture a large body of factual knowledge. However,
some facts can be incorrectly induced or become obsolete over time, resulting
in factually incorrect generations. This has led to the development of various
editing methods that allow updating facts encoded by the model. Evaluation of
these methods has primarily focused on testing whether an individual fact has
been successfully injected, and if similar predictions for other subjects have
not changed. Here we argue that such evaluation is limited, since injecting one
fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple
effect'' in the form of additional facts that the model needs to update
(e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we
propose a novel set of evaluation criteria that consider the implications of an
edit on related facts. Using these criteria, we then construct , a
diagnostic benchmark of 5K factual edits, capturing a variety of types of
ripple effects. We evaluate prominent editing methods on , showing
that current methods fail to introduce consistent changes in the model's
knowledge. In addition, we find that a simple in-context editing baseline
obtains the best scores on our benchmark, suggesting a promising research
direction for model editing.