単一参照ビューによる新規物体の6D姿勢推定

要旨

既存の新規物体6D姿勢推定手法は、通常CADモデルまたは密な参照ビューに依存しており、これらはどちらも取得が困難です。単一の参照ビューのみを使用することはスケーラブルですが、大きな姿勢の不一致や限られた幾何学的・空間的情報のため、挑戦的です。これらの問題に対処するため、我々は単一参照ベースの新規物体6D姿勢推定手法（SinRef-6D）を提案します。我々の鍵となるアイデアは、状態空間モデル（SSM）に基づいてカメラ座標系における点単位のアライメントを反復的に確立することです。具体的には、反復的なカメラ空間点単位アライメントは大きな姿勢の不一致を効果的に処理し、提案するRGBおよびポイントSSMは単一ビューから長距離依存性と空間情報を捕捉し、線形複雑性と優れた空間モデリング能力を提供します。合成データで事前学習されたSinRef-6Dは、再学習やCADモデルを必要とせず、単一の参照ビューのみを使用して新規物体の6D姿勢を推定できます。6つの人気データセットと実世界のロボットシーンでの広範な実験により、我々はより挑戦的な単一参照設定で動作しているにもかかわらず、CADベースおよび密な参照ビューベースの手法と同等の性能を達成することを実証しました。コードはhttps://github.com/CNJianLiu/SinRef-6Dで公開されます。

English

Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in the camera coordinate system based on state space models (SSMs). Specifically, iterative camera-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.

単一参照ビューによる新規物体の6D姿勢推定

Novel Object 6D Pose Estimation with a Single Reference View

要旨

Support