基于视觉基础模型的可泛化知识蒸馏在语义分割中的应用

摘要

知识蒸馏（KD）在语义分割领域已被广泛用于压缩大模型，但传统方法主要保留域内精度而忽略了分布偏移下至关重要的域外泛化能力。随着视觉基础模型（VFMs）的出现，这一局限更为突出：尽管VFMs在未见数据上展现出强大鲁棒性，但采用传统KD方法进行蒸馏往往会削弱这种能力。我们提出可泛化知识蒸馏（GKD），一种通过多阶段框架显式增强泛化能力的方法。GKD将表征学习与任务学习解耦：第一阶段学生模型通过选择性特征蒸馏获得领域无关的表征，第二阶段冻结这些表征进行任务适配，从而缓解对可见域的过拟合。为进一步支持迁移，我们引入基于查询的软蒸馏机制，使学生模型特征作为查询向量，从VFMs中选择性检索可迁移的空间知识。在五个域泛化基准上的大量实验表明，GKD始终优于现有KD方法，在基础模型到基础模型（F2F）和基础模型到本地模型（F2L）的蒸馏中分别实现平均+1.9%和+10.6%的性能提升。代码已开源于https://github.com/Younger-hua/GKD。

English

Knowledge distillation (KD) has been widely applied in semantic segmentation to compress large models, but conventional approaches primarily preserve in-domain accuracy while neglecting out-of-domain generalization, which is essential under distribution shifts. This limitation becomes more severe with the emergence of vision foundation models (VFMs): although VFMs exhibit strong robustness on unseen data, distilling them with conventional KD often compromises this ability. We propose Generalizable Knowledge Distillation (GKD), a multi-stage framework that explicitly enhances generalization. GKD decouples representation learning from task learning. In the first stage, the student acquires domain-agnostic representations through selective feature distillation, and in the second stage, these representations are frozen for task adaptation, thereby mitigating overfitting to visible domains. To further support transfer, we introduce a query-based soft distillation mechanism, where student features act as queries to teacher representations to selectively retrieve transferable spatial knowledge from VFMs. Extensive experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods, achieving average gains of +1.9% in foundation-to-foundation (F2F) and +10.6% in foundation-to-local (F2L) distillation. The code will be available at https://github.com/Younger-hua/GKD.