KORE:通过知识导向的增强与约束提升大型多模态模型的知识注入能力
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints
October 22, 2025
作者: Kailin Jiang, Hongbo Jiang, Ning Jiang, Zhi Gao, Jinhe Bi, Yuchen Ren, Bin Li, Yuntao Du, Lei Liu, Qing Li
cs.AI
摘要
大型多模态模型在其预训练权重中编码了丰富的事实知识。然而,这些知识保持静态且有限,无法跟上现实世界的发展步伐,这阻碍了持续的知识获取。因此,有效的知识注入变得至关重要,涉及两个目标:知识适应(注入新知识)和知识保留(保存旧知识)。现有方法往往难以学习新知识,并遭受灾难性遗忘的困扰。为解决这一问题,我们提出了KORE,一种协同的知识导向增强与约束方法,用于向大型多模态模型注入新知识的同时保留旧知识。与一般的文本或图像数据增强不同,KORE自动将单个知识项转化为结构化和全面的知识,确保模型准确学习新知识,实现精准适应。同时,KORE将先前的知识存储于LMM线性层激活的协方差矩阵中,并通过将原始权重投影到矩阵的零空间来初始化适配器,定义了一个微调方向,最大限度地减少对先前知识的干扰,实现强大的保留能力。在包括LLaVA-v1.5-7B、LLaVA-v1.5-13B和Qwen2.5-VL-7B在内的多种LMM上的广泛实验表明,KORE在注入新知识方面表现出色,并有效缓解了灾难性遗忘。
English
Large Multimodal Models encode extensive factual knowledge in their
pre-trained weights. However, its knowledge remains static and limited, unable
to keep pace with real-world developments, which hinders continuous knowledge
acquisition. Effective knowledge injection thus becomes critical, involving two
goals: knowledge adaptation (injecting new knowledge) and knowledge retention
(preserving old knowledge). Existing methods often struggle to learn new
knowledge and suffer from catastrophic forgetting. To address this, we propose
KORE, a synergistic method of KnOwledge-oRientEd augmentations and constraints
for injecting new knowledge into large multimodal models while preserving old
knowledge. Unlike general text or image data augmentation, KORE automatically
converts individual knowledge items into structured and comprehensive knowledge
to ensure that the model accurately learns new knowledge, enabling accurate
adaptation. Meanwhile, KORE stores previous knowledge in the covariance matrix
of LMM's linear layer activations and initializes the adapter by projecting the
original weights into the matrix's null space, defining a fine-tuning direction
that minimizes interference with previous knowledge, enabling powerful
retention. Extensive experiments on various LMMs, including LLaVA-v1.5-7B,
LLaVA-v1.5-13B, and Qwen2.5-VL-7B, show that KORE achieves superior new
knowledge injection performance and effectively mitigates catastrophic
forgetting.