评估语言模型中知识编辑的涟漪效应

摘要

现代语言模型涵盖了大量的事实知识。然而，一些事实可能被错误地归纳，或随着时间变迁而变得过时，导致生成的内容存在事实错误。这促使各种编辑方法的发展，允许更新模型编码的事实。对这些方法的评估主要集中在测试单个事实是否成功注入，以及其他主题的类似预测是否发生了变化。在这里，我们认为这样的评估是有限的，因为注入一个事实（例如，“Jack Depp是Johnny Depp的儿子”）会引入“涟漪效应”，即模型需要更新的其他事实（例如，“Jack Depp是Lily-Rose Depp的兄弟”）。为了解决这个问题，我们提出了一套新的评估标准，考虑了编辑对相关事实的影响。利用这些标准，我们构建了一个包含5K个事实编辑的诊断基准，捕捉了各种涟漪效应类型。我们评估了主要的编辑方法，展示了当前方法未能在模型知识中引入一致变化。此外，我们发现一个简单的上下文编辑基准在我们的基准测试中获得了最佳分数，表明模型编辑的一个有前途的研究方向。

English

Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct , a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on , showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.

评估语言模型中知识编辑的涟漪效应

Evaluating the Ripple Effects of Knowledge Editing in Language Models

摘要

Support