知識編集は幻覚を本当に修正できるのか？

要旨

大規模言語モデル（LLMs）は、様々なタスクにおいて優れた性能を持つにも関わらず、生成されたコンテンツにおける非事実情報を指す幻覚に悩まされています。一方で、知識編集は、LLMsにエンコードされた誤った事実知識を修正するための新しい人気のあるパラダイムとして開発されており、ゼロからの再トレーニングを回避する利点があります。ただし、既存の知識編集の評価データセットの一般的な問題点は、編集前にLLMsが評価質問に幻覚的な回答を実際に生成しているかどうかを保証していないことです。LLMsが異なる手法によって編集された後、このようなデータセットで評価されると、異なる知識編集方法の効果を評価するためにパフォーマンスを直接採用することが難しくなります。したがって、基本的な問題は不十分に検証されたままです。知識編集は本当にLLMsにおける幻覚を修正できるのでしょうか？私たちは、HalluEditBenchを提案し、実世界の幻覚を修正するための知識編集方法を包括的に評価するためのベンチマークを提供しました。まず、9つのドメイン、26のトピック、6,000以上の幻覚を含む大規模な幻覚データセットを厳密に構築しました。その後、効果、一般化、移植性、局所性、および頑健性を含む5つの次元で、知識編集方法のパフォーマンスを包括的に評価しました。HalluEditBenchを通じて、異なる知識編集方法が幻覚を修正する際の潜在能力と制限について新たな示唆を提供し、将来の改善を促進し、知識編集分野の進歩を支援することができます。

English

Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct the erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, one common issue of existing evaluation datasets for knowledge editing is that they do not ensure LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6,000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate the progress in the field of knowledge editing.

知識編集は幻覚を本当に修正できるのか？

Can Knowledge Editing Really Correct Hallucinations?

要旨

Support