준지도 기반 파운데이션 모델 증류를 통한 학생 전문가 훈련

초록

파운데이션 모델은 강력한 인식 성능을 제공하지만, 배포하기에는 계산 부하가 너무 크며 적응 과정에는 일반적으로 비용이 많이 드는 어노테이션이 필요합니다. 본 연구에서는 제한된 레이블 데이터와 풍부한 비레이블 데이터를 사용하여 사전 학습된 비전 파운데이션 모델(VFM)을 컴팩트한 전문가 모델로 압축하는 준지도식 지식 증류(SSKD) 프레임워크를 소개하고, 픽셀 단위 레이블 비용이 특히 높은 인스턴스 분할 작업에 이를 적용합니다. 이 프레임워크는 세 단계로 진행됩니다: (1) 대비 보정을 통한 자기 학습을 이용한 VFM의 도메인 적응, (2) 통합 다중 목적 손실 함수를 통한 지식 전달, (3) 잔류 의사 레이블 편향을 완화하기 위한 학생 모델 정제. 우리 접근법의 핵심은 마스크 점수와 클래스 점수를 융합하여 정보량이 많은 네거티브 샘플을 추출하고 명확한 인스턴스 간 마진을 강화하는 인스턴스 인식 픽셀 단위 대비 손실 함수입니다. 적응과 증류 두 과정에 걸쳐 이 대비 신호를 유지함으로써, 교사 모델과 학생 모델의 임베딩을 정렬하고 비레이블 이미지를 보다 효과적으로 활용합니다. Cityscapes와 ADE20K 데이터셋에서, 약 11배 더 작은 우리의 학생 모델은 제로샷 VFM 교사 모델 대비 +11.9, +8.6 AP 향상되었으며, 적응된 교사 모델 대비 +3.4, +1.5 AP 우수한 성능을 보였고, 벤치마크에서 최신 SSKD 방법들을 능가했습니다.

English

Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations. We introduce a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFMs) into compact experts using limited labeled and abundant unlabeled data, and instantiate it for instance segmentation where per-pixel labels are particularly expensive. The framework unfolds in three stages: (1) domain adaptation of the VFM(s) via self-training with contrastive calibration, (2) knowledge transfer through a unified multi-objective loss, and (3) student refinement to mitigate residual pseudo-label bias. Central to our approach is an instance-aware pixel-wise contrastive loss that fuses mask and class scores to extract informative negatives and enforce clear inter-instance margins. By maintaining this contrastive signal across both adaptation and distillation, we align teacher and student embeddings and more effectively leverage unlabeled images. On Cityscapes and ADE20K, our approx 11times smaller student improves over its zero-shot VFM teacher(s) by +11.9 and +8.6 AP, surpasses adapted teacher(s) by +3.4 and +1.5 AP, and outperforms state-of-the-art SSKD methods on benchmarks.

준지도 기반 파운데이션 모델 증류를 통한 학생 전문가 훈련

Training a Student Expert via Semi-Supervised Foundation Model Distillation

초록

Support