PoseDiffusion: 확산 기반 번들 조정을 통한 포즈 추정 해결

초록

카메라 포즈 추정은 오랜 기간 동안 컴퓨터 비전 분야에서 다루어져 온 문제로, 현재까지도 수작업 키포인트 매칭, RANSAC, 번들 조정과 같은 고전적인 방법에 의존하는 경우가 많다. 본 논문에서는 구조 추정(Structure from Motion, SfM) 문제를 확률적 확산 프레임워크 내에서 공식화하고, 입력 이미지가 주어졌을 때 카메라 포즈의 조건부 분포를 모델링하는 방법을 제안한다. 이 오래된 문제에 대한 새로운 관점은 몇 가지 장점을 가진다. (i) 확산 프레임워크의 특성은 번들 조정의 반복적 절차를 반영한다. (ii) 이 공식화는 에피폴라 기하학에서의 기하학적 제약 조건을 원활하게 통합할 수 있게 한다. (iii) 넓은 베이스라인을 가진 희소 뷰와 같은 일반적으로 어려운 시나리오에서 뛰어난 성능을 보인다. (iv) 이 방법은 임의의 수의 이미지에 대해 내부 및 외부 파라미터를 예측할 수 있다. 우리는 제안한 방법인 PoseDiffusion이 두 개의 실제 데이터셋에서 기존의 고전적 SfM 파이프라인과 학습 기반 접근법을 크게 개선함을 보여준다. 마지막으로, 우리의 방법이 추가 학습 없이도 데이터셋 간에 일반화할 수 있음을 관찰하였다. 프로젝트 페이지: https://posediffusion.github.io/

English

Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of an old problem has several advantages. (i) The nature of the diffusion framework mirrors the iterative procedure of bundle adjustment. (ii) The formulation allows a seamless integration of geometric constraints from epipolar geometry. (iii) It excels in typically difficult scenarios such as sparse views with wide baselines. (iv) The method can predict intrinsics and extrinsics for an arbitrary amount of images. We demonstrate that our method PoseDiffusion significantly improves over the classic SfM pipelines and the learned approaches on two real-world datasets. Finally, it is observed that our method can generalize across datasets without further training. Project page: https://posediffusion.github.io/

PoseDiffusion: 확산 기반 번들 조정을 통한 포즈 추정 해결

PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

초록

Support