MINED:面向大型多模态模型的多模态时效性知识探测与更新
MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models
October 22, 2025
作者: Kailin Jiang, Ning Jiang, Yuchen Ren, Yuchen Li, Yifan Gao, Jinhe Bi, Yunpu Ma, Qingqing Liu, Xianhao Wang, Yifan Jia, Hongbo Jiang, Yaocong Hu, Bin Li, Lei Liu, Yuntao Du
cs.AI
摘要
大型多模态模型(LMMs)通过跨模态预训练编码了丰富的知识,但其静态表征难以准确理解时效性知识。现有基准测试受限于静态设计,无法充分评估LMMs对时效性知识的理解能力。为填补这一空白,我们提出了MINED,一个综合性的基准测试,从认知、意识、可信度、理解、推理和鲁棒性六个关键维度及十一项挑战性任务来评估时间感知能力。MINED由两位专业标注者基于维基百科构建,包含跨越六种知识类型的2,104个时效性知识样本。对15个广泛使用的LMMs在MINED上的评估显示,Gemini-2.5-Pro以63.07的平均CEM得分位居榜首,而大多数开源LMMs仍缺乏时间理解能力。同时,LMMs在组织知识上表现最佳,而在体育知识上表现最弱。针对这些挑战,我们探索了通过知识编辑方法更新LMMs中时效性知识的可行性,并观察到在单一编辑场景下,LMMs能够有效利用知识编辑方法更新知识。
English
Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal
pre-training, yet their static representations struggle to maintain an accurate
understanding of time-sensitive factual knowledge. Existing benchmarks remain
constrained by static designs, inadequately evaluating LMMs' ability to
understand time-sensitive knowledge. To address this gap, we propose MINED, a
comprehensive benchmark that evaluates temporal awareness along 6 key
dimensions and 11 challenging tasks: cognition, awareness, trustworthiness,
understanding, reasoning, and robustness. MINED is constructed from Wikipedia
by two professional annotators, containing 2,104 time-sensitive knowledge
samples spanning six knowledge types. Evaluating 15 widely used LMMs on MINED
shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07,
while most open-source LMMs still lack time understanding ability. Meanwhile,
LMMs perform best on organization knowledge, whereas their performance is
weakest on sport. To address these challenges, we investigate the feasibility
of updating time-sensitive knowledge in LMMs through knowledge editing methods
and observe that LMMs can effectively update knowledge via knowledge editing
methods in single editing scenarios.