MINED: 大規模マルチモーダルモデルのためのマルチモーダル時間感応型知識の探索と更新

要旨

大規模マルチモーダルモデル（LMMs）は、クロスモーダル事前学習を通じて豊富な事実知識を符号化しますが、その静的な表現は時間に敏感な事実知識を正確に理解するのに苦労しています。既存のベンチマークは静的な設計に制約されており、LMMsが時間に敏感な知識を理解する能力を適切に評価できていません。このギャップを埋めるため、我々はMINEDを提案します。これは、6つの主要な次元（認知、認識、信頼性、理解、推論、堅牢性）と11の挑戦的なタスクに沿って時間的認識を評価する包括的なベンチマークです。MINEDは、2人の専門アノテーターによってWikipediaから構築され、6つの知識タイプにまたがる2,104の時間に敏感な知識サンプルを含んでいます。15の広く使用されているLMMsをMINEDで評価した結果、Gemini-2.5-Proが平均CEMスコア63.07で最高の成績を収めましたが、ほとんどのオープンソースLMMsはまだ時間理解能力を欠いています。一方、LMMsは組織知識において最も優れたパフォーマンスを示す一方で、スポーツにおいては最も弱いパフォーマンスを示しました。これらの課題に対処するため、我々は知識編集手法を通じてLMMsの時間に敏感な知識を更新する可能性を調査し、LMMsが単一編集シナリオにおいて知識編集手法を介して効果的に知識を更新できることを観察しました。

English

Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal pre-training, yet their static representations struggle to maintain an accurate understanding of time-sensitive factual knowledge. Existing benchmarks remain constrained by static designs, inadequately evaluating LMMs' ability to understand time-sensitive knowledge. To address this gap, we propose MINED, a comprehensive benchmark that evaluates temporal awareness along 6 key dimensions and 11 challenging tasks: cognition, awareness, trustworthiness, understanding, reasoning, and robustness. MINED is constructed from Wikipedia by two professional annotators, containing 2,104 time-sensitive knowledge samples spanning six knowledge types. Evaluating 15 widely used LMMs on MINED shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07, while most open-source LMMs still lack time understanding ability. Meanwhile, LMMs perform best on organization knowledge, whereas their performance is weakest on sport. To address these challenges, we investigate the feasibility of updating time-sensitive knowledge in LMMs through knowledge editing methods and observe that LMMs can effectively update knowledge via knowledge editing methods in single editing scenarios.

MINED: 大規模マルチモーダルモデルのためのマルチモーダル時間感応型知識の探索と更新

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models

要旨

Support