MMKE-Bench:多元視覺知識的多模態編輯基準
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
February 27, 2025
作者: Yuntao Du, Kailin Jiang, Zhi Gao, Chenrui Shi, Zilong Zheng, Siyuan Qi, Qing Li
cs.AI
摘要
知識編輯技術已成為更新大型語言模型(LLMs)和多模態模型(LMMs)事實知識的重要工具,使它們能夠在不重新訓練的情況下修正過時或錯誤的資訊。然而,現有的多模態知識編輯基準主要集中於以簡單三元組表示的實體層面知識,未能捕捉現實世界多模態資訊的複雜性。為解決這一問題,我們引入了MMKE-Bench,一個全面的多模態知識編輯基準,旨在評估LMMs在現實場景中編輯多樣化視覺知識的能力。MMKE-Bench通過整合三種類型的編輯任務來應對這些限制:視覺實體編輯、視覺語義編輯和用戶特定編輯。此外,MMKE-Bench使用自由形式的自然語言來表示和編輯知識,提供了一種更靈活且有效的格式。該基準包含33個廣泛類別中的2,940條知識和8,363張圖片,評估問題自動生成並經過人工驗證。我們在三個知名LMMs上評估了五種最先進的知識編輯方法,發現沒有任何方法在所有標準上表現出色,且視覺和用戶特定編輯尤其具有挑戰性。MMKE-Bench為評估多模態知識編輯技術的穩健性設定了新標準,推動了這一快速發展領域的進步。
English
Knowledge editing techniques have emerged as essential tools for updating the
factual knowledge of large language models (LLMs) and multimodal models (LMMs),
allowing them to correct outdated or inaccurate information without retraining
from scratch. However, existing benchmarks for multimodal knowledge editing
primarily focus on entity-level knowledge represented as simple triplets, which
fail to capture the complexity of real-world multimodal information. To address
this issue, we introduce MMKE-Bench, a comprehensive MultiModal Knowledge
Editing Benchmark, designed to evaluate the ability of LMMs to edit diverse
visual knowledge in real-world scenarios. MMKE-Bench addresses these
limitations by incorporating three types of editing tasks: visual entity
editing, visual semantic editing, and user-specific editing. Besides,
MMKE-Bench uses free-form natural language to represent and edit knowledge,
offering a more flexible and effective format. The benchmark consists of 2,940
pieces of knowledge and 8,363 images across 33 broad categories, with
evaluation questions automatically generated and human-verified. We assess five
state-of-the-art knowledge editing methods on three prominent LMMs, revealing
that no method excels across all criteria, and that visual and user-specific
edits are particularly challenging. MMKE-Bench sets a new standard for
evaluating the robustness of multimodal knowledge editing techniques, driving
progress in this rapidly evolving field.Summary
AI-Generated Summary