동적 장면에서 RGB 전용 지도 학습 기반 카메라 파라미터 최적화

초록

정적 장면에서 카메라 파라미터 최적화를 위한 주된 방법으로 오랫동안 COLMAP이 사용되어 왔지만, 이 방법은 긴 실행 시간과 동적 장면에 적용하기 위해 필요한 실측(GT) 모션 마스크에 대한 의존성으로 인해 제약을 받아왔습니다. 많은 연구들이 GT 초점 거리, 모션 마스크, 3D 포인트 클라우드, 카메라 포즈, 메트릭 깊이와 같은 추가적인 사전 정보를 통합하여 이를 개선하려 시도했으나, 이러한 정보들은 일반적으로 캐주얼하게 촬영된 RGB 비디오에서는 사용할 수 없습니다. 본 논문에서는 단일 RGB 비디오만을 감독 정보로 사용하여 동적 장면에서 더 정확하고 효율적인 카메라 파라미터 최적화를 위한 새로운 방법을 제안합니다. 우리의 방법은 세 가지 주요 구성 요소로 이루어져 있습니다: (1) 패치 단위 추적 필터(Patch-wise Tracking Filters)는 RGB 비디오 전반에 걸쳐 견고하고 최대한 희소한 힌지 관계를 설정합니다. (2) 이상치 인식 공동 최적화(Outlier-aware Joint Optimization)는 모션 사전 정보에 의존하지 않고 이동하는 이상치를 적응적으로 가중치를 낮춰 효율적으로 카메라 파라미터를 최적화합니다. (3) 두 단계 최적화 전략(Two-stage Optimization Strategy)은 손실 함수에서 소프트플러스 한계와 볼록 최소값 사이의 균형을 통해 안정성과 최적화 속도를 향상시킵니다. 우리는 카메라 추정치를 시각적 및 수치적으로 평가합니다. 정확성을 더 검증하기 위해, 카메라 추정치를 4D 재구성 방법에 입력하고 결과로 나온 3D 장면, 렌더링된 2D RGB 및 깊이 맵을 평가합니다. 우리는 4개의 실제 데이터셋(NeRF-DS, DAVIS, iPhone, TUM-dynamics)과 1개의 합성 데이터셋(MPI-Sintel)에서 실험을 수행하여, 우리의 방법이 단일 RGB 비디오만을 감독 정보로 사용하여 더 효율적이고 정확하게 카메라 파라미터를 추정함을 입증합니다.

English

Although COLMAP has long remained the predominant method for camera parameter optimization in static scenes, it is constrained by its lengthy runtime and reliance on ground truth (GT) motion masks for application to dynamic scenes. Many efforts attempted to improve it by incorporating more priors as supervision such as GT focal length, motion masks, 3D point clouds, camera poses, and metric depth, which, however, are typically unavailable in casually captured RGB videos. In this paper, we propose a novel method for more accurate and efficient camera parameter optimization in dynamic scenes solely supervised by a single RGB video. Our method consists of three key components: (1) Patch-wise Tracking Filters, to establish robust and maximally sparse hinge-like relations across the RGB video. (2) Outlier-aware Joint Optimization, for efficient camera parameter optimization by adaptive down-weighting of moving outliers, without reliance on motion priors. (3) A Two-stage Optimization Strategy, to enhance stability and optimization speed by a trade-off between the Softplus limits and convex minima in losses. We visually and numerically evaluate our camera estimates. To further validate accuracy, we feed the camera estimates into a 4D reconstruction method and assess the resulting 3D scenes, and rendered 2D RGB and depth maps. We perform experiments on 4 real-world datasets (NeRF-DS, DAVIS, iPhone, and TUM-dynamics) and 1 synthetic dataset (MPI-Sintel), demonstrating that our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.

동적 장면에서 RGB 전용 지도 학습 기반 카메라 파라미터 최적화

RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes

초록

Support