基於單一參考視角的新穎物體6D姿態估計
Novel Object 6D Pose Estimation with a Single Reference View
March 7, 2025
作者: Jian Liu, Wei Sun, Kai Zeng, Jin Zheng, Hui Yang, Lin Wang, Hossein Rahmani, Ajmal Mian
cs.AI
摘要
現有的新穎物體6D姿態估計方法通常依賴於CAD模型或密集的參考視圖,這兩者都難以獲取。僅使用單一參考視圖更具可擴展性,但由於姿態差異大且幾何和空間信息有限,這也帶來了挑戰。為了解決這些問題,我們提出了一種基於單一參考視圖的新穎物體6D姿態估計方法(SinRef-6D)。我們的核心思想是基於狀態空間模型(SSMs)在相機坐標系中迭代建立點對點對齊。具體而言,迭代的相機空間點對點對齊能有效處理大姿態差異,而我們提出的RGB和點雲SSMs能從單一視圖中捕捉長程依賴和空間信息,提供線性複雜度和優越的空間建模能力。一旦在合成數據上預訓練完成,SinRef-6D僅需單一參考視圖即可估計新穎物體的6D姿態,無需重新訓練或CAD模型。在六個流行數據集和真實世界機器人場景上的廣泛實驗表明,儘管在更具挑戰性的單一參考設置下運行,我們仍能達到與基於CAD和密集參考視圖方法相當的性能。代碼將發佈於https://github.com/CNJianLiu/SinRef-6D。
English
Existing novel object 6D pose estimation methods typically rely on CAD models
or dense reference views, which are both difficult to acquire. Using only a
single reference view is more scalable, but challenging due to large pose
discrepancies and limited geometric and spatial information. To address these
issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose
estimation method. Our key idea is to iteratively establish point-wise
alignment in the camera coordinate system based on state space models (SSMs).
Specifically, iterative camera-space point-wise alignment can effectively
handle large pose discrepancies, while our proposed RGB and Points SSMs can
capture long-range dependencies and spatial information from a single view,
offering linear complexity and superior spatial modeling capability. Once
pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel
object using only a single reference view, without requiring retraining or a
CAD model. Extensive experiments on six popular datasets and real-world robotic
scenes demonstrate that we achieve on-par performance with CAD-based and dense
reference view-based methods, despite operating in the more challenging single
reference setting. Code will be released at
https://github.com/CNJianLiu/SinRef-6D.Summary
AI-Generated Summary