LagerNVS: 完全ニューラルなリアルタイム新視点合成のための潜在幾何

要旨

近年の研究では、ニューラルネットワークが明示的な3D再構成を行わずに、新視点合成（NVS）のような3Dタスクを実行できることが示されている。それにもかかわらず、我々は強力な3D帰納バイアスが、そのようなネットワークの設計において依然として有用であると主張する。この点を実証するため、我々は「3D認識」潜在特徴に基づくNVS用エンコーダ-デコーダニューラルネットワークであるLagerNVSを提案する。エンコーダは、明示的な3D監督を用いて事前学習された3D再構成ネットワークから初期化される。これに軽量なデコーダを組み合わせ、測光損失を用いてエンドツーエンドで学習させる。LagerNVSは、カメラパラメータが既知か否かにかかわらず、決定論的フィードフォワード型の新視点合成において最先端の性能（Re10kデータセットでPSNR 31.4を含む）を達成し、リアルタイムレンダリングが可能、実世界データへの一般化性を有し、拡散デコーダと組み合わせることで生成的補外も行える。

English

Recent work has shown that neural networks can perform 3D tasks such as Novel View Synthesis (NVS) without explicit 3D reconstruction. Even so, we argue that strong 3D inductive biases are still helpful in the design of such networks. We show this point by introducing LagerNVS, an encoder-decoder neural network for NVS that builds on `3D-aware' latent features. The encoder is initialized from a 3D reconstruction network pre-trained using explicit 3D supervision. This is paired with a lightweight decoder, and trained end-to-end with photometric losses. LagerNVS achieves state-of-the-art deterministic feed-forward Novel View Synthesis (including 31.4 PSNR on Re10k), with and without known cameras, renders in real time, generalizes to in-the-wild data, and can be paired with a diffusion decoder for generative extrapolation.

LagerNVS: 完全ニューラルなリアルタイム新視点合成のための潜在幾何

LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis

要旨

Support