NeCo: パッチ近傍一貫性を用いて19GPU時間でDINOv2の空間表現を改善

要旨

我々は、事前学習済み表現を改善するための新たな自己教師あり学習信号として、複数ビュー間でのパッチ表現のソートを提案します。この目的のために、NeCo: Patch Neighbor Consistencyを導入します。これは、参照バッチに対する学生モデルと教師モデル間のパッチレベルの最近傍一貫性を強制する新しい学習損失関数です。我々の手法は、DINOv2-registersなどの事前学習済み表現の上に適用可能な微分可能なソート方法を活用し、学習信号をブートストラップしてさらに改善します。この高密度な事後事前学習により、単一GPUでわずか19時間しか必要としないにもかかわらず、様々なモデルとデータセットで優れた性能を発揮します。この手法が高品質な高密度特徴エンコーダを生成し、いくつかの新しい最先端の結果を確立することを実証します：ADE20kとPascal VOCでの非パラメトリックなインコンテキストセマンティックセグメンテーションにおいて+5.5%と+6%、COCO-Thingsと-Stuffでの線形セグメンテーション評価において+7.2%と+5.7%の改善を達成しました。

English

We propose sorting patch representations across views as a novel self-supervised learning signal to improve pretrained representations. To this end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that enforces patch-level nearest neighbor consistency across a student and teacher model, relative to reference batches. Our method leverages a differentiable sorting method applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. We demonstrate that this method generates high-quality dense feature encoders and establish several new state-of-the-art results: +5.5% and + 6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.

NeCo: パッチ近傍一貫性を用いて19GPU時間でDINOv2の空間表現を改善

NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency

要旨

Support