ChatPaper.aiChatPaper

SHIC:无关键点监督的形状图像对应

SHIC: Shape-Image Correspondences with no Keypoint Supervision

July 26, 2024
作者: Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
cs.AI

摘要

规范表面映射通过将对象的每个像素分配给3D模板中的相应点来推广关键点检测。由于DensePose在人体分析中的流行,作者们尝试将这一概念应用于更多类别,但由于手动监督的高成本而取得了有限的成功。在这项工作中,我们介绍了SHIC,一种无需手动监督就能学习规范映射的方法,其在大多数类别中取得了比监督方法更好的结果。我们的想法是利用基础计算机视觉模型,如DINO和Stable Diffusion,这些模型是开放式的,因此对自然类别具有出色的先验知识。SHIC将估计图像到模板对应关系的问题简化为使用基础模型的特征来预测图像到图像的对应关系。这种简化通过将对象的图像与模板的非照片般渲染进行匹配来实现,这模拟了收集此任务的手动注释的过程。然后,这些对应关系被用来监督任何感兴趣对象的高质量规范映射。我们还展示了图像生成器可以进一步改善模板视图的逼真度,为模型提供了额外的监督来源。
English
Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template. Popularised by DensePose for the analysis of humans, authors have since attempted to apply the concept to more categories, but with limited success due to the high cost of manual supervision. In this work, we introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories. Our idea is to leverage foundation computer vision models such as DINO and Stable Diffusion that are open-ended and thus possess excellent priors over natural categories. SHIC reduces the problem of estimating image-to-template correspondences to predicting image-to-image correspondences using features from the foundation models. The reduction works by matching images of the object to non-photorealistic renders of the template, which emulates the process of collecting manual annotations for this task. These correspondences are then used to supervise high-quality canonical maps for any object of interest. We also show that image generators can further improve the realism of the template views, which provide an additional source of supervision for the model.

Summary

AI-Generated Summary

PDF422November 28, 2024