NeCo: 패치 이웃 일관성을 이용하여 DINOv2의 공간 표현 개선하기 (GPU 19시간 소요)

초록

우리는 사전 학습된 표현을 개선하기 위해 새로운 자기 지도 학습 신호로서 다양한 관점에서 패치 표현을 정렬하는 것을 제안합니다. 이를 위해, 우리는 NeCo: 패치 이웃 일관성을 소개합니다. 이는 학생과 교사 모델 간의 패치 수준에서 참조 배치에 대한 이웃 일관성을 강화하는 새로운 훈련 손실입니다. 우리의 방법은 사전 학습된 표현 위에 적용된 미분 가능한 정렬 방법을 활용하여 DINOv2-레지스터와 같은 학습 신호를 부트스트랩하고 더 개선합니다. 이러한 밀집한 사후 사전 훈련은 다양한 모델과 데이터셋에서 우수한 성능을 보여주며, 단일 GPU에서 19시간만 소요됩니다. 이 방법이 고품질 밀집 특성 인코더를 생성하고 ADE20k 및 Pascal VOC에서 비모수형 인-컨텍스트 의미 분할에 대해 +5.5% 및 +6%, COCO-Things 및 -Stuff에서 선형 분할 평가에 대해 +7.2% 및 +5.7%의 새로운 최첨단 결과를 수립하는 것을 입증합니다.

English

We propose sorting patch representations across views as a novel self-supervised learning signal to improve pretrained representations. To this end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that enforces patch-level nearest neighbor consistency across a student and teacher model, relative to reference batches. Our method leverages a differentiable sorting method applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. We demonstrate that this method generates high-quality dense feature encoders and establish several new state-of-the-art results: +5.5% and + 6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.

NeCo: 패치 이웃 일관성을 이용하여 DINOv2의 공간 표현 개선하기 (GPU 19시간 소요)

NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency

초록

Support