NeCo：在19個GPU小時內通過Patch鄰域一致性改善DINOv2的空間表示。

摘要

我們提出在不同視圖中對補丁表示進行排序，作為一種新穎的自監督學習信號，以改善預訓練表示。為此，我們引入 NeCo：補丁鄰域一致性，這是一種新穎的訓練損失，強制在學生模型和教師模型之間實現基於參考批次的補丁級最近鄰一致性。我們的方法利用可微分排序方法應用於預訓練表示之上，例如 DINOv2-註冊，以啟動學習信號並進一步改進它們。這種密集的後預訓練方法在各種模型和數據集上實現了優越性能，儘管僅需在單個 GPU 上進行 19 小時。我們展示了這種方法生成了高質量的密集特徵編碼器，並建立了幾個新的最新成果：在 ADE20k 和 Pascal VOC 上進行非參數化上下文語義分割，分別提高了 +5.5% 和 +6%，在 COCO-Things 和 -Stuff 上進行線性分割評估，分別提高了 +7.2% 和 +5.7%。

English

We propose sorting patch representations across views as a novel self-supervised learning signal to improve pretrained representations. To this end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that enforces patch-level nearest neighbor consistency across a student and teacher model, relative to reference batches. Our method leverages a differentiable sorting method applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. We demonstrate that this method generates high-quality dense feature encoders and establish several new state-of-the-art results: +5.5% and + 6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.

NeCo：在19個GPU小時內通過Patch鄰域一致性改善DINOv2的空間表示。

NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency

摘要

Support