ChatPaper.aiChatPaper

LN3Diff:可擴展的潛在神經場擴散,用於快速3D生成

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

March 18, 2024
作者: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy
cs.AI

摘要

神經渲染領域隨著生成模型和可微渲染技術的進步取得了顯著進展。儘管2D擴散取得了成功,但統一的3D擴散管道仍未確定。本文介紹了一個名為LN3Diff的新框架,以填補這一空白,實現快速、高質量和通用的有條件3D生成。我們的方法利用3D感知架構和變分自編碼器(VAE)將輸入圖像編碼為結構化、緊湊和3D潛在空間。透過基於變換器的解碼器將潛在解碼為高容量的3D神經場。通過在這個3D感知潛在空間上訓練擴散模型,我們的方法在ShapeNet上實現了最先進的3D生成性能,並在各種數據集上展示了在單眼3D重建和有條件3D生成方面的優越性能。此外,它在推理速度方面超越了現有的3D擴散方法,無需每個實例進行優化。我們提出的LN3Diff在3D生成建模方面取得了重大進展,並在3D視覺和圖形任務中展示了各種應用的潛力。
English
The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.

Summary

AI-Generated Summary

PDF102December 15, 2024