시맨틱 분할을 위한 시각 기반 모델의 일반화 가능한 지식 증류

초록

지식 증류(KD)는 대형 모델을 압축하기 위해 의미 분할에 널리 적용되어 왔지만, 기존 접근법은 주로 도메인 내 정확도를 유지하는 데 중점을 두는 반면 분포 변화 상황에서 필수적인 도메인 외 일반화를 간과해 왔습니다. 이러한 한계는 비전 파운데이션 모델(VFM)의 등장으로 더욱 두드러집니다: VFM이 보이지 않는 데이터에서 강력한 견고성을 보여주지만, 기존 KD를 사용하여 이를 증류하면 이러한 능력이 희생되는 경우가 많습니다. 우리는 일반화를 명시적으로 향상시키는 다단계 프레임워크인 일반화 가능 지식 증류(GKD)를 제안합니다. GKD는 표현 학습과 작업 학습을 분리합니다. 첫 번째 단계에서는 학생 모델이 선택적 특징 증류를 통해 도메인에 구애받지 않는 표현을 습득하고, 두 번째 단계에서는 이러한 표현을 고정하여 작업 적응을 수행함으로써 가시적인 도메인에의 과적합을 완화합니다. 전이를 추가로 지원하기 위해, 학생 특징이 쿼리 역할을 하여 VFM으로부터 전이 가능한 공간 지식을 선택적으로 검색하는 쿼리 기반 소프트 증류 메커니즘을 도입합니다. 5개의 도메인 일반화 벤치마크에서 진행한 폭넓은 실험을 통해 GKD가 기존 KD 방법들을 지속적으로 능가하며, 파운데이션-투-파운데이션(F2F) 증류에서 평균 +1.9%, 파운데이션-투-로컬(F2L) 증류에서 평균 +10.6%의 성능 향상을 달성함을 입증했습니다. 코드는 https://github.com/Younger-hua/GKD에서 공개될 예정입니다.

English

Knowledge distillation (KD) has been widely applied in semantic segmentation to compress large models, but conventional approaches primarily preserve in-domain accuracy while neglecting out-of-domain generalization, which is essential under distribution shifts. This limitation becomes more severe with the emergence of vision foundation models (VFMs): although VFMs exhibit strong robustness on unseen data, distilling them with conventional KD often compromises this ability. We propose Generalizable Knowledge Distillation (GKD), a multi-stage framework that explicitly enhances generalization. GKD decouples representation learning from task learning. In the first stage, the student acquires domain-agnostic representations through selective feature distillation, and in the second stage, these representations are frozen for task adaptation, thereby mitigating overfitting to visible domains. To further support transfer, we introduce a query-based soft distillation mechanism, where student features act as queries to teacher representations to selectively retrieve transferable spatial knowledge from VFMs. Extensive experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods, achieving average gains of +1.9% in foundation-to-foundation (F2F) and +10.6% in foundation-to-local (F2L) distillation. The code will be available at https://github.com/Younger-hua/GKD.

시맨틱 분할을 위한 시각 기반 모델의 일반화 가능한 지식 증류

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

초록

Support