未知の共変量シフト下における知識蒸留の改善：信頼度誘導型データ拡張によるアプローチ

要旨

大規模なデータセットで訓練された基盤モデルは、様々な領域で強力なゼロショット能力を発揮します。データとモデルサイズが制約される状況でその成功を再現するため、基盤モデルから小さな学生ネットワークへ知識を転送する手法として、知識蒸留が確立されたツールとなっています。しかし、蒸留の効果は利用可能な訓練データによって大きく制限されます。本研究では、知識蒸留における共変量シフトという一般的な実践的問題に取り組みます。これは、訓練中に出現するがテスト時には現れない偽の特徴量が問題となる状況です。我々は次の問いを立てます：これらの偽の特徴量が未知であるが、ロバストな教師モデルが利用可能な場合、学生モデルもそれらに対してロバストになることは可能か？この問題に対処するため、教師と学生の間の不一致を最大化することで画像を生成する、新たな拡散ベースのデータ拡張戦略を導入します。これにより、学生モデルが苦戦する挑戦的なサンプルを効果的に作成します。実験結果は、CelebAやSpuCo Birdsにおける最悪グループ精度と平均グループ精度、および共変量シフト下のspurious ImageNetにおける偽のmAUCにおいて、我々のアプローチが最先端の拡散ベースのデータ拡張ベースラインを上回り、大幅な改善をもたらすことを示しています。

English

Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and the student, effectively creating challenging samples that the student struggles with. Experiments demonstrate that our approach significantly improves worst group and mean group accuracy on CelebA and SpuCo Birds as well as the spurious mAUC on spurious ImageNet under covariate shift, outperforming state-of-the-art diffusion-based data augmentation baselines

未知の共変量シフト下における知識蒸留の改善：信頼度誘導型データ拡張によるアプローチ

Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

要旨

Support