LN3Diff:可扩展的潜在神经场扩散,用于快速生成3D模型
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
March 18, 2024
作者: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy
cs.AI
摘要
神经渲染领域随着生成模型和可微渲染技术的进步取得了显著进展。尽管2D扩散取得了成功,但统一的3D扩散管道仍未确定。本文介绍了一种名为LN3Diff的新框架,以填补这一空白,实现快速、高质量和通用的有条件3D生成。我们的方法利用3D感知架构和变分自动编码器(VAE)将输入图像编码为结构化、紧凑和3D潜空间。这个潜空间由基于变换器的解码器解码为高容量的3D神经场。通过在这个3D感知潜空间上训练扩散模型,我们的方法在ShapeNet上实现了最先进的3D生成性能,并在单眼3D重建和各种数据集上的有条件3D生成中表现出优越性能。此外,它在推理速度方面超越了现有的3D扩散方法,无需每个实例的优化。我们提出的LN3Diff在3D生成建模方面取得了重大进展,并在3D视觉和图形任务中展现了广泛应用的前景。
English
The field of neural rendering has witnessed significant progress with
advancements in generative models and differentiable rendering techniques.
Though 2D diffusion has achieved success, a unified 3D diffusion pipeline
remains unsettled. This paper introduces a novel framework called LN3Diff to
address this gap and enable fast, high-quality, and generic conditional 3D
generation. Our approach harnesses a 3D-aware architecture and variational
autoencoder (VAE) to encode the input image into a structured, compact, and 3D
latent space. The latent is decoded by a transformer-based decoder into a
high-capacity 3D neural field. Through training a diffusion model on this
3D-aware latent space, our method achieves state-of-the-art performance on
ShapeNet for 3D generation and demonstrates superior performance in monocular
3D reconstruction and conditional 3D generation across various datasets.
Moreover, it surpasses existing 3D diffusion methods in terms of inference
speed, requiring no per-instance optimization. Our proposed LN3Diff presents a
significant advancement in 3D generative modeling and holds promise for various
applications in 3D vision and graphics tasks.Summary
AI-Generated Summary