ChatPaper.aiChatPaper

SHIC:無需關鍵點監督的形狀-圖像對應

SHIC: Shape-Image Correspondences with no Keypoint Supervision

July 26, 2024
作者: Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
cs.AI

摘要

規範表面映射通過將物體的每個像素分配給3D模板中的相應點來泛化關鍵點檢測。由於DensePose在人類分析中的普及,作者們自那時起便試圖將這個概念應用於更多類別,但由於手動監督的高成本而取得有限成功。在這項工作中,我們介紹了SHIC,一種無需手動監督即可學習規範映射的方法,它在大多數類別上取得比監督方法更好的結果。我們的想法是利用基礎計算機視覺模型,如DINO和Stable Diffusion,這些模型是開放的,因此對自然類別具有出色的先驗知識。SHIC將估計圖像到模板對應的問題簡化為使用基礎模型的特徵來預測圖像到圖像的對應。這種簡化通過將物體的圖像與模板的非照片寫實渲染進行匹配來工作,這模擬了收集此任務的手動標註的過程。然後,這些對應用於監督任何感興趣物體的高質量規範映射。我們還表明,圖像生成器可以進一步改善模板視圖的真實感,為模型提供了額外的監督來源。
English
Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template. Popularised by DensePose for the analysis of humans, authors have since attempted to apply the concept to more categories, but with limited success due to the high cost of manual supervision. In this work, we introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories. Our idea is to leverage foundation computer vision models such as DINO and Stable Diffusion that are open-ended and thus possess excellent priors over natural categories. SHIC reduces the problem of estimating image-to-template correspondences to predicting image-to-image correspondences using features from the foundation models. The reduction works by matching images of the object to non-photorealistic renders of the template, which emulates the process of collecting manual annotations for this task. These correspondences are then used to supervise high-quality canonical maps for any object of interest. We also show that image generators can further improve the realism of the template views, which provide an additional source of supervision for the model.

Summary

AI-Generated Summary

PDF422November 28, 2024