LU-NeRF：ローカルな未ポーズNeRFの同期によるシーンとポーズ推定

要旨

NeRFモデルが広く実世界で展開されることを妨げる重要な障害は、正確なカメラポーズへの依存性である。その結果、カメラポーズとシーン表現を同時に最適化するNeRFモデルの拡張に対する関心が高まっている。これは、既知の失敗モードを持つ既存のSfMパイプラインに代わる選択肢を提供する。ポーズなしのNeRFに対する既存のアプローチは、事前のポーズ分布や粗いポーズ初期化などの限られた仮定の下で動作するため、一般的な設定では効果が低い。本研究では、ポーズ構成に関する仮定を緩和し、カメラポーズとニューラルラジアンスフィールドを同時に推定する新しいアプローチ、LU-NeRFを提案する。我々のアプローチは、ローカルからグローバルへと段階的に動作し、まずデータのローカルサブセット、すなわちミニシーンを最適化する。LU-NeRFは、この困難な少数ショットタスクに対してローカルなポーズとジオメトリを推定する。ミニシーンのポーズは、ロバストなポーズ同期ステップを経てグローバルな参照フレームに統合され、最終的なポーズとシーンのグローバル最適化が行われる。我々のLU-NeRFパイプラインは、ポーズ事前分布に関する制限的な仮定を設けることなく、ポーズなしのNeRFに対する従来の試みを上回る性能を示す。これにより、ベースラインとは異なり、一般的なSE(3)ポーズ設定で動作することが可能となる。また、我々のモデルは、低テクスチャや低解像度の画像においてCOLMAPと比較しても良好な結果を示し、特徴ベースのSfMパイプラインと補完的であることが示唆される。

English

A critical obstacle preventing NeRF models from being deployed broadly in the wild is their reliance on accurate camera poses. Consequently, there is growing interest in extending NeRF models to jointly optimize camera poses and scene representation, which offers an alternative to off-the-shelf SfM pipelines which have well-understood failure modes. Existing approaches for unposed NeRF operate under limited assumptions, such as a prior pose distribution or coarse pose initialization, making them less effective in a general setting. In this work, we propose a novel approach, LU-NeRF, that jointly estimates camera poses and neural radiance fields with relaxed assumptions on pose configuration. Our approach operates in a local-to-global manner, where we first optimize over local subsets of the data, dubbed mini-scenes. LU-NeRF estimates local pose and geometry for this challenging few-shot task. The mini-scene poses are brought into a global reference frame through a robust pose synchronization step, where a final global optimization of pose and scene can be performed. We show our LU-NeRF pipeline outperforms prior attempts at unposed NeRF without making restrictive assumptions on the pose prior. This allows us to operate in the general SE(3) pose setting, unlike the baselines. Our results also indicate our model can be complementary to feature-based SfM pipelines as it compares favorably to COLMAP on low-texture and low-resolution images.

LU-NeRF：ローカルな未ポーズNeRFの同期によるシーンとポーズ推定

LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs

要旨

Support