再構成可能な潜在空間ニューラルラジアンスフィールドによる効率的な3Dシーン表現

要旨

ニューラルラジアンスフィールド（NeRF）は、複雑なシーンの高品質な新視点合成を実現する強力な3D表現として証明されています。NeRFはグラフィックス、ビジョン、ロボティクスに応用されていますが、レンダリング速度の遅さや特徴的な視覚的アーティファクトの問題により、多くのユースケースでの採用が妨げられています。本研究では、オートエンコーダ（AE）とNeRFを組み合わせることを検討し、色ではなく潜在特徴をレンダリングし、その後畳み込みデコードする手法を提案します。その結果得られる潜在空間NeRFは、標準的な色空間NeRFよりも高品質な新視点を生成でき、AEが特定の視覚的アーティファクトを補正する一方で、レンダリング速度が3倍以上高速化されます。我々の手法は、NeRFの効率を改善する他の技術と直交しています。さらに、AEアーキテクチャを縮小することで効率と画質のトレードオフを制御でき、性能のわずかな低下で13倍以上の高速レンダリングを実現します。我々のアプローチが、特に連続学習を必要とする多くのロボティクスシナリオのように、微分可能性を保持することが有用な場合に、下流タスクのための効率的かつ高忠実度な3Dシーン表現の基盤となることを期待しています。

English

Neural Radiance Fields (NeRFs) have proven to be powerful 3D representations, capable of high quality novel view synthesis of complex scenes. While NeRFs have been applied to graphics, vision, and robotics, problems with slow rendering speed and characteristic visual artifacts prevent adoption in many use cases. In this work, we investigate combining an autoencoder (AE) with a NeRF, in which latent features (instead of colours) are rendered and then convolutionally decoded. The resulting latent-space NeRF can produce novel views with higher quality than standard colour-space NeRFs, as the AE can correct certain visual artifacts, while rendering over three times faster. Our work is orthogonal to other techniques for improving NeRF efficiency. Further, we can control the tradeoff between efficiency and image quality by shrinking the AE architecture, achieving over 13 times faster rendering with only a small drop in performance. We hope that our approach can form the basis of an efficient, yet high-fidelity, 3D scene representation for downstream tasks, especially when retaining differentiability is useful, as in many robotics scenarios requiring continual learning.

再構成可能な潜在空間ニューラルラジアンスフィールドによる効率的な3Dシーン表現

Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

要旨

Support