基于视觉基础模型的通用知识蒸馏在语义分割中的应用

摘要

知識蒸餾（KD）在語義分割領域已被廣泛應用於壓縮大型模型，但傳統方法主要側重於保持域內精度，而忽略了域外泛化能力——這一特性在分佈偏移場景下至關重要。隨著視覺基礎模型（VFMs）的出現，該局限性愈發凸顯：儘管VFMs在未見數據上展現出強健的魯棒性，但採用傳統KD方法對其進行蒸餾時，往往會削弱這種能力。我們提出可泛化知識蒸餾（GKD），這是一種顯式增強泛化能力的多階段框架。GKD將表徵學習與任務學習解耦：第一階段中，學生模型通過選擇性特徵蒸餾獲取領域無關表徵；第二階段則凍結這些表徵進行任務適配，從而緩解對可見領域的過擬合。為進一步支持遷移，我們引入基於查詢的軟蒸餾機制，使學生模型特徵作為查詢向量，從VFMs中選擇性檢索可遷移的空間知識。在五個領域泛化基準上的大量實驗表明，GKD持續優於現有KD方法，在基礎模型到基礎模型（F2F）和基礎模型到局部模型（F2L）的蒸餾中分別實現了平均+1.9%和+10.6%的性能提升。代碼將發佈於https://github.com/Younger-hua/GKD。

English

Knowledge distillation (KD) has been widely applied in semantic segmentation to compress large models, but conventional approaches primarily preserve in-domain accuracy while neglecting out-of-domain generalization, which is essential under distribution shifts. This limitation becomes more severe with the emergence of vision foundation models (VFMs): although VFMs exhibit strong robustness on unseen data, distilling them with conventional KD often compromises this ability. We propose Generalizable Knowledge Distillation (GKD), a multi-stage framework that explicitly enhances generalization. GKD decouples representation learning from task learning. In the first stage, the student acquires domain-agnostic representations through selective feature distillation, and in the second stage, these representations are frozen for task adaptation, thereby mitigating overfitting to visible domains. To further support transfer, we introduce a query-based soft distillation mechanism, where student features act as queries to teacher representations to selectively retrieve transferable spatial knowledge from VFMs. Extensive experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods, achieving average gains of +1.9% in foundation-to-foundation (F2F) and +10.6% in foundation-to-local (F2L) distillation. The code will be available at https://github.com/Younger-hua/GKD.

基于视觉基础模型的通用知识蒸馏在语义分割中的应用

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

摘要

Support