缓解统一多模态模型持续学习中的模态内与模态间遗忘问题
Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models
December 2, 2025
作者: Xiwen Wei, Mustafa Munir, Radu Marculescu
cs.AI
摘要
统一多模态生成模型(UMGM)在单一自回归框架内整合了视觉理解与图像生成能力。然而,其持续学习新任务的能力受到灾难性遗忘现象的严重制约,这种遗忘既存在于模态内部(模态内遗忘),也存在于跨模态之间(模态间遗忘)。虽然模态内遗忘在先前持续学习(CL)研究中已有探讨,但模态间遗忘仍属未充分探索的领域。本文通过实证验证了UMGM中存在的模态间遗忘现象,并从模态间梯度冲突的角度提供了理论解释。为同时解决模态内与模态间遗忘问题,我们提出模态解耦专家(MoDE)——一种轻量级可扩展架构,该架构通过隔离模态特定更新以缓解梯度冲突,并利用知识蒸馏来防止灾难性遗忘及保留预训练能力。与以往保持模态耦合而遭受模态梯度冲突的持续学习方法不同,MoDE通过显式解耦模态来避免相互干扰。在多组基准测试中的实验表明,MoDE能显著缓解模态间与模态内遗忘,在统一多模态生成场景中优于现有持续学习基线方法。代码将公开于:https://github.com/Christina200/MoDE-official.git
English
Unified Multimodal Generative Models (UMGMs) unify visual understanding and image generation within a single autoregressive framework. However, their ability to continually learn new tasks is severely hindered by catastrophic forgetting, both within a modality (intra-modal) and across modalities (inter-modal). While intra-modal forgetting has been studied in prior continual learning (CL) work, inter-modal forgetting remains largely unexplored. In this paper, we identify and empirically validate this phenomenon in UMGMs and provide a theoretical explanation rooted in gradient conflict between modalities. To address both intra- and inter-modal forgetting, we propose Modality-Decoupled Experts (MoDE), a lightweight and scalable architecture that isolates modality-specific updates to mitigate the gradient conflict and leverages knowledge distillation to prevent catastrophic forgetting and preserve pre-trained capabilities. Unlike previous CL methods that remain modality-coupled and suffer from modality gradient conflict, MoDE explicitly decouples modalities to prevent interference. Experiments across diverse benchmarks demonstrate that MoDE significantly mitigates both inter- and intra-modal forgetting, outperforming prior CL baselines in unified multimodal generation settings. Codes will be publicly available: https://github.com/Christina200/MoDE-official.git