カメラ姿勢と分解された低ランクテンソル放射輝度フィールドの共同最適化におけるロバスト性の向上

要旨

本論文では、分解された低ランクテンソルで表現されるカメラポーズとシーン形状を、2D画像のみを教師信号として共同で最適化するアルゴリズムを提案する。まず、1D信号に基づくパイロットスタディを行い、その知見を3Dシナリオに適用する。ボクセルベースのNeRFにおける素朴な共同ポーズ最適化が容易に準最適解に陥ることを明らかにする。さらに、周波数スペクトルの分析に基づき、2Dおよび3D放射輝度場に対して畳み込みガウシアンフィルタを適用し、粗から細かい訓練スケジュールを実現することで、カメラポーズの共同最適化を可能にする。分解された低ランクテンソルの特性を活用することで、我々の手法は計算オーバーヘッドを最小限に抑えつつ、力任せの3D畳み込みと同等の効果を達成する。共同最適化のロバスト性と安定性をさらに向上させるため、平滑化された2D教師信号、ランダムにスケーリングされたカーネルパラメータ、エッジ誘導型損失マスクといった技術も提案する。広範な定量的・定性的評価を通じて、提案手法が新規視点合成において優れた性能を発揮し、最適化の迅速な収束を実現することを示す。

English

In this paper, we propose an algorithm that allows joint refinement of camera pose and scene geometry represented by decomposed low-rank tensor, using only 2D images as supervision. First, we conduct a pilot study based on a 1D signal and relate our findings to 3D scenarios, where the naive joint pose optimization on voxel-based NeRFs can easily lead to sub-optimal solutions. Moreover, based on the analysis of the frequency spectrum, we propose to apply convolutional Gaussian filters on 2D and 3D radiance fields for a coarse-to-fine training schedule that enables joint camera pose optimization. Leveraging the decomposition property in decomposed low-rank tensor, our method achieves an equivalent effect to brute-force 3D convolution with only incurring little computational overhead. To further improve the robustness and stability of joint optimization, we also propose techniques of smoothed 2D supervision, randomly scaled kernel parameters, and edge-guided loss mask. Extensive quantitative and qualitative evaluations demonstrate that our proposed framework achieves superior performance in novel view synthesis as well as rapid convergence for optimization.

カメラ姿勢と分解された低ランクテンソル放射輝度フィールドの共同最適化におけるロバスト性の向上

Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields

要旨

Support