ChatPaper.aiChatPaper

使用潛在擴散模型在幾秒內對3D高斯場景進行取樣

Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models

June 18, 2024
作者: Paul Henderson, Melonie de Almeida, Daniela Ivanova, Titas Anciukevičius
cs.AI

摘要

我們提出了一個潛在擴散模型,用於3D場景,可以僅使用2D圖像數據進行訓練。為了實現這一目標,我們首先設計了一個自編碼器,將多視圖圖像映射到3D高斯斑點,同時構建了這些斑點的壓縮潛在表示。然後,我們在潛在空間上訓練多視圖擴散模型,以學習一個高效的生成模型。這個流程不需要對象遮罩或深度,適用於具有任意相機位置的複雜場景。我們在兩個大規模複雜現實世界場景數據集MVImgNet和RealEstate10K上進行了仔細的實驗。我們展示了我們的方法能夠在短短0.2秒內生成3D場景,可以從頭開始生成,從單個輸入視圖生成,或者從稀疏輸入視圖生成。它生成多樣且高質量的結果,運行速度比非潛在擴散模型和早期基於NeRF的生成模型快一個數量級。
English
We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data. To achieve this, we first design an autoencoder that maps multi-view images to 3D Gaussian splats, and simultaneously builds a compressed latent representation of these splats. Then, we train a multi-view diffusion model over the latent space to learn an efficient generative model. This pipeline does not require object masks nor depths, and is suitable for complex scenes with arbitrary camera positions. We conduct careful experiments on two large-scale datasets of complex real-world scenes -- MVImgNet and RealEstate10K. We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, from a single input view, or from sparse input views. It produces diverse and high-quality results while running an order of magnitude faster than non-latent diffusion models and earlier NeRF-based generative models

Summary

AI-Generated Summary

PDF41December 2, 2024