NeCo:使用Patch Neighbor Consistency在19个GPU小时内改进DINOv2的空间表示。
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency
August 20, 2024
作者: Valentinos Pariza, Mohammadreza Salehi, Gertjan Burghouts, Francesco Locatello, Yuki M. Asano
cs.AI
摘要
我们提出了跨视图对补丁表示进行排序作为一种新颖的自监督学习信号,以改进预训练表示。为此,我们引入NeCo:Patch Neighbor Consistency,这是一种新颖的训练损失,它强制在学生模型和教师模型之间相对于参考批次实现补丁级最近邻一致性。我们的方法利用可微分排序方法应用于预训练表示之上,例如DINOv2-registers,以引导学习信号并进一步改进它们。这种密集的后预训练方法在各种模型和数据集上实现了卓越的性能,尽管仅需要在单个GPU上进行19小时。我们证明了这种方法生成了高质量的密集特征编码器,并建立了几个新的最先进结果:在ADE20k和Pascal VOC上进行非参数上下文语义分割,分别提高了+5.5%和+6%,在COCO-Things和-Stuff上进行线性分割评估,分别提高了+7.2%和+5.7%。
English
We propose sorting patch representations across views as a novel
self-supervised learning signal to improve pretrained representations. To this
end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that
enforces patch-level nearest neighbor consistency across a student and teacher
model, relative to reference batches. Our method leverages a differentiable
sorting method applied on top of pretrained representations, such as
DINOv2-registers to bootstrap the learning signal and further improve upon
them. This dense post-pretraining leads to superior performance across various
models and datasets, despite requiring only 19 hours on a single GPU. We
demonstrate that this method generates high-quality dense feature encoders and
establish several new state-of-the-art results: +5.5% and + 6% for
non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and
+7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.Summary
AI-Generated Summary