使用潜在扩散模型在几秒钟内对3D高斯场景进行采样
Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models
June 18, 2024
作者: Paul Henderson, Melonie de Almeida, Daniela Ivanova, Titas Anciukevičius
cs.AI
摘要
我们提出了一个潜在扩散模型,用于3D场景,可以仅使用2D图像数据进行训练。为实现这一目标,我们首先设计了一个自动编码器,将多视角图像映射到3D高斯斑点,并同时构建这些斑点的压缩潜在表示。然后,我们在潜在空间上训练一个多视角扩散模型,以学习一个高效的生成模型。这个流程不需要对象掩模或深度,并适用于具有任意摄像机位置的复杂场景。我们在两个大规模数据集MVImgNet和RealEstate10K的复杂现实世界场景上进行了仔细的实验。我们展示了我们的方法能够在仅0.2秒内生成3D场景,无论是从头开始,从单个输入视图,还是从稀疏输入视图。它产生多样且高质量的结果,同时比非潜在扩散模型和早期基于NeRF的生成模型运行速度快一个数量级。
English
We present a latent diffusion model over 3D scenes, that can be trained using
only 2D image data. To achieve this, we first design an autoencoder that maps
multi-view images to 3D Gaussian splats, and simultaneously builds a compressed
latent representation of these splats. Then, we train a multi-view diffusion
model over the latent space to learn an efficient generative model. This
pipeline does not require object masks nor depths, and is suitable for complex
scenes with arbitrary camera positions. We conduct careful experiments on two
large-scale datasets of complex real-world scenes -- MVImgNet and
RealEstate10K. We show that our approach enables generating 3D scenes in as
little as 0.2 seconds, either from scratch, from a single input view, or from
sparse input views. It produces diverse and high-quality results while running
an order of magnitude faster than non-latent diffusion models and earlier
NeRF-based generative modelsSummary
AI-Generated Summary