ChatPaper.aiChatPaper

基于可变形对象先验的相机空间类别级三维对应

Category-Level 3D Correspondence in Camera Space via Morphable Object Priors

May 27, 2026
作者: Leonhard Sommer, Artur Jesslen, Basavaraj Sunagad, Adam Kortylewski
cs.AI

摘要

从图像中理解三维物体是机器人技术与AR/VR应用的基础。尽管近期工作在类别级位姿估计方面取得了进展,现有表示方法仍无法捕捉推理物体部件、功能及交互所需的细粒度语义信息。本研究聚焦于相机空间中的类别级三维对应关系——即从单张图像预测同一类别内各实例间保持一致的3D位置——并发现通过学习共享的可变形物体先验,无需显式对应监督即可涌现此类能力。为推进该方向研究,我们提出HouseCorr3D——首个大规模单目类别级三维对应基准,包含50个家居物体类别、280个独立实例的17.8万张图像,以及直接标注于CAD模型上的三维关键点。关键在于,HouseCorr3D提供了遮挡区域的非模态对应标签与显式对称性标注,弥补了现有数据集的关键缺陷。我们进一步提出Morpheus方法,通过解耦规范形状、形变与物体姿态来学习可变形类别级形状先验。借助这种共享规范基准,相机空间中语义有意义的三维对应关系得以隐式涌现。这些涌现的三维对应方法在HouseCorr3D上达到了当前最优水平,证明无需直接对应监督即可实现语义级三维物体理解。数据集与代码已开源至https://github.com/GenIntel/HouseCorr3D。
English
Understanding 3D objects from images is fundamental to robotics and AR/VR applications. While recent work has made progress in category-level pose estimation, current representations fail to capture the fine-grained semantics needed for reasoning about object parts, functions, and interactions. In this work, we study category-level 3D correspondence in camera space -- predicting, from a single image, 3D locations that remain consistent across instances within a category -- and show that it can emerge without explicit correspondence supervision by learning a shared morphable object prior. To enable research in this direction, we introduce HouseCorr3D, the first large-scale benchmark for monocular category-level 3D correspondence with 178k images across 50 household object categories, 280 unique instances, and 3D keypoint annotations directly on CAD models. Crucially, HouseCorr3D provides amodal correspondence labels for occluded regions and explicit symmetry annotations, addressing key limitations of existing datasets. We further propose Morpheus, a method that learns morphable category-level shape priors by disentangling canonical shape, deformation, and object pose. Through this shared canonical grounding, semantically meaningful 3D correspondences in camera space emerge implicitly. These emerging 3D correspondences set a new state of the art on HouseCorr3D, demonstrating that semantic 3D object understanding can arise without direct correspondence supervision. Data and code are publicly available at https://github.com/GenIntel/HouseCorr3D.