言語モデルにおける知識編集の波及効果の評価

要旨

現代の言語モデルは膨大な事実知識を保持しています。しかし、一部の事実は誤って導出されたり、時間の経過とともに陳腐化したりするため、事実に反する生成が行われることがあります。これを受けて、モデルに符号化された事実を更新するための様々な編集手法が開発されてきました。これらの手法の評価は主に、個々の事実が正常に注入されたかどうか、および他の対象に関する類似の予測が変化していないかどうかをテストすることに焦点を当ててきました。ここで我々は、このような評価は限定的であると主張します。なぜなら、1つの事実（例：「ジャック・デップはジョニー・デップの息子である」）を注入すると、モデルが更新する必要のある追加の事実（例：「ジャック・デップはリリー＝ローズ・デップの兄弟である」）という「波及効果」が生じるからです。この問題に対処するため、我々は編集が関連する事実に及ぼす影響を考慮した新しい評価基準を提案します。これらの基準を用いて、5,000件の事実編集からなる診断ベンチマークを構築し、様々なタイプの波及効果を捉えます。我々は主要な編集手法をこのベンチマークで評価し、現在の手法ではモデルの知識に一貫した変更を導入できないことを示します。さらに、シンプルなインコンテキスト編集ベースラインが我々のベンチマークで最高スコアを獲得することを発見し、モデル編集の有望な研究方向性を示唆します。

English

Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct , a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on , showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.

言語モデルにおける知識編集の波及効果の評価

Evaluating the Ripple Effects of Knowledge Editing in Language Models

要旨

Support