見たままをGANで再現：3D GANにおける高忠実度ジオメトリのためのピクセル単位のレンダリング

要旨

3D-aware Generative Adversarial Networks（GAN）は、ニューラルボリュームレンダリングを介して2D画像のコレクションからマルチビュー整合性のある画像とシーンの3Dジオメトリを生成する学習において顕著な進歩を示してきました。しかし、ボリュームレンダリングにおける高密度サンプリングのメモリと計算コストの高さから、3D GANはパッチベースのトレーニングを採用したり、低解像度レンダリングと2D超解像度の後処理を組み合わせることを余儀なくされており、これによりマルチビュー整合性と解決されたジオメトリの品質が犠牲になっています。その結果、3D GANは2D画像に含まれる豊富な3Dジオメトリを完全に解決することができていませんでした。本研究では、ニューラルボリュームレンダリングをネイティブ2D画像のより高い解像度にスケールアップする技術を提案し、これにより前例のない詳細さで微細な3Dジオメトリを解決します。私たちのアプローチでは、学習ベースのサンプラーを使用して、3D GANトレーニングのためのニューラルレンダリングを最大5倍少ない深度サンプルで加速します。これにより、トレーニングと推論中にフル解像度画像の「すべてのピクセルをレンダリング」することが可能になり、2Dでの後処理超解像度を必要としません。高品質な表面ジオメトリを学習する戦略と組み合わせることで、私たちの手法は高解像度の3Dジオメトリと厳密なビュー整合性のある画像を合成し、後処理超解像度に依存するベースラインと同等の画像品質を維持します。FFHQとAFHQにおいて最先端の3Dジオメトリ品質を実証し、3D GANにおける3D形状の教師なし学習の新たな基準を確立しました。

English

3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries of scenes from collections of 2D images via neural volume rendering. Yet, the significant memory and computational costs of dense sampling in volume rendering have forced 3D GANs to adopt patch-based training or employ low-resolution rendering with post-processing 2D super resolution, which sacrifices multiview consistency and the quality of resolved geometry. Consequently, 3D GANs have not yet been able to fully resolve the rich 3D geometry present in 2D images. In this work, we propose techniques to scale neural volume rendering to the much higher resolution of native 2D images, thereby resolving fine-grained 3D geometry with unprecedented detail. Our approach employs learning-based samplers for accelerating neural rendering for 3D GAN training using up to 5 times fewer depth samples. This enables us to explicitly "render every pixel" of the full-resolution image during training and inference without post-processing superresolution in 2D. Together with our strategy to learn high-quality surface geometry, our method synthesizes high-resolution 3D geometry and strictly view-consistent images while maintaining image quality on par with baselines relying on post-processing super resolution. We demonstrate state-of-the-art 3D gemetric quality on FFHQ and AFHQ, setting a new standard for unsupervised learning of 3D shapes in 3D GANs.

見たままをGANで再現：3D GANにおける高忠実度ジオメトリのためのピクセル単位のレンダリング

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

要旨

Support