我们真的需要编辑语言模型吗？关于编辑后语言模型的评估

摘要

模型编辑已成为在语言模型内高效更新知识的日益流行的替代方法。当前方法主要关注可靠性、泛化性和局部性，许多方法在这些标准上表现出色。一些最近的研究揭示了这些编辑方法的缺陷，如知识失真或冲突。然而，后编辑语言模型的一般能力尚未被探索。在本文中，我们对各种编辑方法和不同语言模型进行了全面评估，并得出以下发现。 (1) 现有的编辑方法在一般基准上不可避免地导致性能下降，表明现有的编辑方法仅在少数几十次编辑内保持模型的一般能力。当编辑次数稍多时，模型的内在知识结构会被破坏甚至完全损坏。 (2) 针对指令的模型对编辑更具鲁棒性，在编辑后一般知识的性能下降较小。 (3) 大规模语言模型相对于小模型更具抗编辑能力。 (4) 即使对于那些与安全对齐的模型，编辑后模型的安全性也显著减弱。我们的发现表明，当前的编辑方法仅适用于语言模型内小规模知识更新，这促使进一步研究更实用和可靠的编辑方法。代码和复现细节可在 https://github.com/lqinfdim/EditingEvaluation 找到。

English

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited language models remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different language models, and have following findings. (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even completely damaged. (2) Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing. (3) Language model with large scale is more resistant to editing compared to small model. (4) The safety of the edited model, is significantly weakened, even for those safety-aligned models. Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods. The details of code and reproduction can be found in https://github.com/lqinfdim/EditingEvaluation.

我们真的需要编辑语言模型吗？关于编辑后语言模型的评估

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

摘要

Support