通過信心引導的數據增強改善未知協變量偏移下的知識蒸餾

摘要

在廣泛數據集上訓練的大型基礎模型展現出跨領域的強大零樣本能力。為了在數據和模型規模受限的情況下複製其成功，知識蒸餾已成為將基礎模型知識轉移至小型學生網絡的成熟工具。然而，蒸餾的有效性嚴重受限於可用的訓練數據。本研究針對知識蒸餾中常見的協變量偏移問題，即訓練時出現但在測試時不存在的虛假特徵，提出解決方案。我們探討的問題是：當這些虛假特徵未知，但存在一個穩健的教師模型時，學生模型是否也能對這些特徵變得穩健？我們通過引入一種新穎的基於擴散的數據增強策略來解決這一問題，該策略通過最大化教師與學生之間的分歧來生成圖像，從而創造出學生模型難以應對的挑戰性樣本。實驗結果表明，在CelebA和SpuCo Birds數據集上，我們的方法顯著提升了最差組和平均組的準確率，以及在虛假ImageNet數據集上協變量偏移下的虛假mAUC，超越了基於擴散的最先進數據增強基線方法。

English

Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and the student, effectively creating challenging samples that the student struggles with. Experiments demonstrate that our approach significantly improves worst group and mean group accuracy on CelebA and SpuCo Birds as well as the spurious mAUC on spurious ImageNet under covariate shift, outperforming state-of-the-art diffusion-based data augmentation baselines

通過信心引導的數據增強改善未知協變量偏移下的知識蒸餾

Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

摘要

Support