图像扩散中的新兴对应关系

摘要

在计算机视觉中，寻找图像之间的对应关系是一个基本问题。本文展示了在图像扩散模型中，对应关系会在没有明确监督的情况下出现。我们提出了一种简单的策略，从扩散网络中提取这种隐含知识作为图像特征，即扩散特征（DIFT），并将其用于建立真实图像之间的对应关系。在没有对任务特定数据或标注进行额外微调或监督的情况下，DIFT 能够在识别语义、几何和时间对应关系方面胜过弱监督方法和竞争性现成特征。特别是对于语义对应关系，来自稳定扩散的 DIFT 能够在具有挑战性的 SPair-71k 基准测试中分别比 DINO 和 OpenCLIP 高出 19 和 14 个准确度点。甚至在 18 个类别中的 9 个中，DIFT 能够胜过最先进的监督方法，同时在整体性能上保持一致。项目页面：https://diffusionfeatures.github.io

English

Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images. Without any additional fine-tuning or supervision on the task-specific data or annotations, DIFT is able to outperform both weakly-supervised methods and competitive off-the-shelf features in identifying semantic, geometric, and temporal correspondences. Particularly for semantic correspondence, DIFT from Stable Diffusion is able to outperform DINO and OpenCLIP by 19 and 14 accuracy points respectively on the challenging SPair-71k benchmark. It even outperforms the state-of-the-art supervised methods on 9 out of 18 categories while remaining on par for the overall performance. Project page: https://diffusionfeatures.github.io

图像扩散中的新兴对应关系

Emergent Correspondence from Image Diffusion

摘要

Support