NeCo：使用Patch Neighbor Consistency在19个GPU小时内改进DINOv2的空间表示。

摘要

我们提出了跨视图对补丁表示进行排序作为一种新颖的自监督学习信号，以改进预训练表示。为此，我们引入NeCo：Patch Neighbor Consistency，这是一种新颖的训练损失，它强制在学生模型和教师模型之间相对于参考批次实现补丁级最近邻一致性。我们的方法利用可微分排序方法应用于预训练表示之上，例如DINOv2-registers，以引导学习信号并进一步改进它们。这种密集的后预训练方法在各种模型和数据集上实现了卓越的性能，尽管仅需要在单个GPU上进行19小时。我们证明了这种方法生成了高质量的密集特征编码器，并建立了几个新的最先进结果：在ADE20k和Pascal VOC上进行非参数上下文语义分割，分别提高了+5.5%和+6%，在COCO-Things和-Stuff上进行线性分割评估，分别提高了+7.2%和+5.7%。

English

We propose sorting patch representations across views as a novel self-supervised learning signal to improve pretrained representations. To this end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that enforces patch-level nearest neighbor consistency across a student and teacher model, relative to reference batches. Our method leverages a differentiable sorting method applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. We demonstrate that this method generates high-quality dense feature encoders and establish several new state-of-the-art results: +5.5% and + 6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.

NeCo：使用Patch Neighbor Consistency在19个GPU小时内改进DINOv2的空间表示。

NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency

摘要

Support