シーン座標再構築：リロカライザーのインクリメンタル学習による画像コレクションのポージング

要旨

シーンを描写する一連の画像からカメラパラメータを推定するタスクに取り組む。一般的な特徴ベースのStructure-from-Motion（SfM）ツールは、このタスクを増分的な再構成によって解決する。つまり、疎な3D点の三角測量と、疎な点群への追加カメラビューの登録を繰り返す。我々は、この増分的なStructure-from-Motionを、視覚的再位置推定器（すなわち、新しいビューを現在の再構成状態に登録する手法）の反復的な適用と改良として再解釈する。この視点により、局所特徴マッチングに基づかない代替的な視覚的再位置推定器を探究することが可能となる。我々は、学習ベースの再位置推定手法であるシーン座標回帰を用いることで、未配置の画像から暗黙的なニューラルシーン表現を構築できることを示す。他の学習ベースの再構成手法とは異なり、姿勢の事前情報や連続的な入力を必要とせず、数千枚の画像に対して効率的に最適化を行う。我々の手法であるACE0（ACE Zero）は、新規視点合成によって実証されるように、特徴ベースのSfMに匹敵する精度でカメラ姿勢を推定する。プロジェクトページ: https://nianticlabs.github.io/acezero/

English

We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. Our method, ACE0 (ACE Zero), estimates camera poses to an accuracy comparable to feature-based SfM, as demonstrated by novel view synthesis. Project page: https://nianticlabs.github.io/acezero/

シーン座標再構築：リロカライザーのインクリメンタル学習による画像コレクションのポージング

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

要旨

Support