通过置信度引导的数据增强改善未知协变量偏移下的知识蒸馏

摘要

在广泛数据集上训练的大型基础模型展现出跨领域的强大零样本能力。为了在数据和模型规模受限时复制其成功，知识蒸馏已成为将基础模型知识迁移至小型学生网络的标准工具。然而，蒸馏的有效性严重受限于可用的训练数据。本研究针对知识蒸馏中常见的协变量偏移问题，即训练时出现但测试时不存在的虚假特征，提出探讨：当这些虚假特征未知，但存在一个鲁棒的教师模型时，学生模型是否也能对它们变得鲁棒？我们通过引入一种新颖的基于扩散的数据增强策略来解决这一问题，该策略通过最大化教师与学生之间的分歧来生成图像，从而创造出学生难以应对的挑战性样本。实验表明，在CelebA、SpuCo Birds数据集上的最差组和平均组准确率，以及在虚假ImageNet上的虚假mAUC指标，我们的方法在协变量偏移条件下均显著提升，超越了当前最先进的基于扩散的数据增强基线方法。

English

Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and the student, effectively creating challenging samples that the student struggles with. Experiments demonstrate that our approach significantly improves worst group and mean group accuracy on CelebA and SpuCo Birds as well as the spurious mAUC on spurious ImageNet under covariate shift, outperforming state-of-the-art diffusion-based data augmentation baselines

通过置信度引导的数据增强改善未知协变量偏移下的知识蒸馏

Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

摘要

Support