ChatPaper.aiChatPaper

透過可形變物體先驗在相機空間中的類別層級三維對應

Category-Level 3D Correspondence in Camera Space via Morphable Object Priors

May 27, 2026
作者: Leonhard Sommer, Artur Jesslen, Basavaraj Sunagad, Adam Kortylewski
cs.AI

摘要

從影像理解3D物體是機器人技術與AR/VR應用的基礎。儘管近期研究在類別級姿態估計上取得進展,但現有表徵仍無法捕捉理解物體部件、功能及交互所需的細粒度語意。本研究探討相機空間中的類別級3D對應關係——從單張影像預測同一類別中不同實例間保持一致的3D位置——並證明透過學習共享的可變形物體先驗,此對應關係可在無明確對應監督下湧現。為推動此研究方向,我們提出HouseCorr3D,首個大規模單目類別級3D對應基準,涵蓋50個家庭物體類別、280個獨特實例共17.8萬張影像,並直接在CAD模型上提供3D關鍵點標註。關鍵在於,HouseCorr3D提供被遮擋區域的模態補全對應標籤與明確對稱性標註,解決現有資料集的主要限制。我們進一步提出Morpheus方法,透過解耦典型形狀、形變與物體姿態,學習可變形的類別級形狀先驗。經由此共享典型基準,相機空間中具語意意義的3D對應關係會隱式湧現。這些新湧現的3D對應在HouseCorr3D上創下新最佳表現,證明無需直接對應監督即可達成語意3D物體理解。資料與程式碼公開於https://github.com/GenIntel/HouseCorr3D。
English
Understanding 3D objects from images is fundamental to robotics and AR/VR applications. While recent work has made progress in category-level pose estimation, current representations fail to capture the fine-grained semantics needed for reasoning about object parts, functions, and interactions. In this work, we study category-level 3D correspondence in camera space -- predicting, from a single image, 3D locations that remain consistent across instances within a category -- and show that it can emerge without explicit correspondence supervision by learning a shared morphable object prior. To enable research in this direction, we introduce HouseCorr3D, the first large-scale benchmark for monocular category-level 3D correspondence with 178k images across 50 household object categories, 280 unique instances, and 3D keypoint annotations directly on CAD models. Crucially, HouseCorr3D provides amodal correspondence labels for occluded regions and explicit symmetry annotations, addressing key limitations of existing datasets. We further propose Morpheus, a method that learns morphable category-level shape priors by disentangling canonical shape, deformation, and object pose. Through this shared canonical grounding, semantically meaningful 3D correspondences in camera space emerge implicitly. These emerging 3D correspondences set a new state of the art on HouseCorr3D, demonstrating that semantic 3D object understanding can arise without direct correspondence supervision. Data and code are publicly available at https://github.com/GenIntel/HouseCorr3D.