MST-Distill:跨模态知识蒸馏的专家教师混合模型
MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation
July 9, 2025
作者: Hui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, Quan Wang
cs.AI
摘要
作为一种高效的知识迁移技术,知识蒸馏在单模态场景中已取得显著成功。然而,在跨模态环境下,传统蒸馏方法因数据和统计异质性面临重大挑战,难以充分利用跨模态教师模型中蕴含的互补先验知识。本文通过实证揭示了现有方法中的两个关键问题:蒸馏路径选择与知识漂移。为克服这些局限,我们提出了MST-Distill,一种新颖的跨模态知识蒸馏框架,其特色在于采用混合专家教师模型。该方法集成了跨模态与多模态配置下的多样化教师模型集合,并结合实例级路由网络,实现自适应、动态的蒸馏过程,有效突破了依赖单一静态教师模型的传统方法限制。此外,我们引入了一个可插拔的掩码模块,该模块独立训练以抑制模态特异性差异并重构教师表征,从而缓解知识漂移,提升迁移效果。在涵盖视觉、音频和文本的五个多样化多模态数据集上的广泛实验表明,我们的方法在跨模态蒸馏任务中显著优于现有最先进的知识蒸馏技术。源代码已发布于https://github.com/Gray-OREO/MST-Distill。
English
Knowledge distillation as an efficient knowledge transfer technique, has
achieved remarkable success in unimodal scenarios. However, in cross-modal
settings, conventional distillation methods encounter significant challenges
due to data and statistical heterogeneities, failing to leverage the
complementary prior knowledge embedded in cross-modal teacher models. This
paper empirically reveals two critical issues in existing approaches:
distillation path selection and knowledge drift. To address these limitations,
we propose MST-Distill, a novel cross-modal knowledge distillation framework
featuring a mixture of specialized teachers. Our approach employs a diverse
ensemble of teacher models across both cross-modal and multimodal
configurations, integrated with an instance-level routing network that
facilitates adaptive and dynamic distillation. This architecture effectively
transcends the constraints of traditional methods that rely on monotonous and
static teacher models. Additionally, we introduce a plug-in masking module,
independently trained to suppress modality-specific discrepancies and
reconstruct teacher representations, thereby mitigating knowledge drift and
enhancing transfer effectiveness. Extensive experiments across five diverse
multimodal datasets, spanning visual, audio, and text, demonstrate that our
method significantly outperforms existing state-of-the-art knowledge
distillation methods in cross-modal distillation tasks. The source code is
available at https://github.com/Gray-OREO/MST-Distill.