ChatPaper.aiChatPaper

MST-Distill:跨模態知識蒸餾的專家教師混合模型

MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation

July 9, 2025
作者: Hui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, Quan Wang
cs.AI

摘要

知識蒸餾作為一種高效的知識轉移技術,在單模態場景中已取得顯著成功。然而,在跨模態環境下,傳統的蒸餾方法因數據和統計異質性面臨重大挑戰,無法充分利用跨模態教師模型中嵌入的互補先驗知識。本文實證揭示了現有方法中的兩個關鍵問題:蒸餾路徑選擇與知識漂移。為解決這些限制,我們提出了MST-Distill,一種新穎的跨模態知識蒸餾框架,其特點在於混合了專精教師。我們的方法採用了跨模態與多模態配置下多樣化的教師模型集成,並結合一個實例級別的路由網絡,實現了自適應且動態的蒸餾。此架構有效超越了依賴單一靜態教師模型的傳統方法之局限。此外,我們引入了一個可插拔的掩碼模塊,獨立訓練以抑制模態特定差異並重構教師表示,從而減輕知識漂移並提升轉移效果。在涵蓋視覺、音頻和文本的五大多模態數據集上的廣泛實驗表明,我們的方法在跨模態蒸餾任務中顯著優於現有的最先進知識蒸餾方法。源代碼已公開於https://github.com/Gray-OREO/MST-Distill。
English
Knowledge distillation as an efficient knowledge transfer technique, has achieved remarkable success in unimodal scenarios. However, in cross-modal settings, conventional distillation methods encounter significant challenges due to data and statistical heterogeneities, failing to leverage the complementary prior knowledge embedded in cross-modal teacher models. This paper empirically reveals two critical issues in existing approaches: distillation path selection and knowledge drift. To address these limitations, we propose MST-Distill, a novel cross-modal knowledge distillation framework featuring a mixture of specialized teachers. Our approach employs a diverse ensemble of teacher models across both cross-modal and multimodal configurations, integrated with an instance-level routing network that facilitates adaptive and dynamic distillation. This architecture effectively transcends the constraints of traditional methods that rely on monotonous and static teacher models. Additionally, we introduce a plug-in masking module, independently trained to suppress modality-specific discrepancies and reconstruct teacher representations, thereby mitigating knowledge drift and enhancing transfer effectiveness. Extensive experiments across five diverse multimodal datasets, spanning visual, audio, and text, demonstrate that our method significantly outperforms existing state-of-the-art knowledge distillation methods in cross-modal distillation tasks. The source code is available at https://github.com/Gray-OREO/MST-Distill.
PDF01July 18, 2025