PoseDiffusion：拡散モデル支援型バンドル調整によるポーズ推定の解決

要旨

カメラポーズ推定は、長年にわたるコンピュータビジョンの課題であり、現在でも手作りのキーポイントマッチング、RANSAC、バンドル調整といった古典的な手法に依存することが多い。本論文では、Structure from Motion (SfM) 問題を確率的拡散フレームワーク内で定式化し、入力画像が与えられたときのカメラポーズの条件付き分布をモデル化することを提案する。この古い問題に対する新しい視点には、いくつかの利点がある。(i) 拡散フレームワークの性質は、バンドル調整の反復的な手順を反映している。(ii) この定式化により、エピポーラジオメトリからの幾何学的制約をシームレスに統合できる。(iii) 広いベースラインを持つ疎なビューといった典型的に困難なシナリオにおいて優れた性能を発揮する。(iv) 任意の数の画像に対する内部パラメータと外部パラメータを予測できる。我々の手法PoseDiffusionが、2つの実世界のデータセットにおいて、古典的なSfMパイプラインや学習ベースのアプローチを大幅に上回ることを実証する。最後に、我々の手法が追加のトレーニングなしにデータセット間で一般化できることが観察された。プロジェクトページ: https://posediffusion.github.io/

English

Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of an old problem has several advantages. (i) The nature of the diffusion framework mirrors the iterative procedure of bundle adjustment. (ii) The formulation allows a seamless integration of geometric constraints from epipolar geometry. (iii) It excels in typically difficult scenarios such as sparse views with wide baselines. (iv) The method can predict intrinsics and extrinsics for an arbitrary amount of images. We demonstrate that our method PoseDiffusion significantly improves over the classic SfM pipelines and the learned approaches on two real-world datasets. Finally, it is observed that our method can generalize across datasets without further training. Project page: https://posediffusion.github.io/

PoseDiffusion：拡散モデル支援型バンドル調整によるポーズ推定の解決

PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

要旨

Support