言語モデルを本当に編集すべきか？編集された言語モデルの評価について

要旨

モデル編集は、言語モデル内の知識を効率的に更新するための人気のある代替手段となっています。現在の手法は、信頼性、汎化、局所性に焦点を当てており、これらの基準を満たす多くの手法が存在しています。最近の研究では、これらの編集手法の欠点、例えば知識の歪みや衝突が明らかにされています。しかし、編集後の言語モデルの一般的な能力については未だに探究されていません。本論文では、さまざまな編集手法と異なる言語モデルについて包括的な評価を行い、以下の結果を得ました。 (1) 既存の編集手法は、一般的なベンチマークで避けられない性能の低下をもたらし、既存の編集手法はモデルの一般的な能力を数十の編集に限定していることを示しています。編集回数がわずかに多い場合、モデルの固有の知識構造が乱れたり、完全に破損したりします。 (2) 指示に調整されたモデルは、編集に対してより堅牢であり、編集後の一般的な知識に対する性能低下が少ないことが示されています。 (3) 大規模な言語モデルは、小さなモデルと比較して編集に対してより抵抗力があります。 (4) 編集されたモデルの安全性は、安全性に配慮されたモデルであっても著しく弱まります。私たちの調査結果は、現在の編集手法が言語モデル内の小規模な知識更新にのみ適していることを示しており、より実用的で信頼性の高い編集手法に関するさらなる研究を促しています。コードの詳細や再現性については、https://github.com/lqinfdim/EditingEvaluation で確認できます。

English

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited language models remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different language models, and have following findings. (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even completely damaged. (2) Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing. (3) Language model with large scale is more resistant to editing compared to small model. (4) The safety of the edited model, is significantly weakened, even for those safety-aligned models. Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods. The details of code and reproduction can be found in https://github.com/lqinfdim/EditingEvaluation.

言語モデルを本当に編集すべきか？編集された言語モデルの評価について

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

要旨

Support