ChatPaper.aiChatPaper

BlockFusion:使用潛在三平面外插法擴展可擴展的3D場景生成

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

January 30, 2024
作者: Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji
cs.AI

摘要

我們提出了BlockFusion,這是一種基於擴散的模型,以單元塊生成3D場景,並無縫地整合新的塊以擴展場景。BlockFusion使用從完整3D場景網格中隨機裁剪的3D塊數據集進行訓練。通過逐塊擬合,所有訓練塊都被轉換為混合神經場:包含幾何特徵的三平面,然後是用於解碼符號距離值的多層感知器(MLP)。變分自編碼器用於將三平面壓縮為潛在三平面空間,對其執行去噪擴散過程。擴散應用於潛在表示,可實現高質量和多樣化的3D場景生成。在生成過程中擴展場景時,只需附加空塊以與當前場景重疊,並外推現有的潛在三平面以填充新塊。外推是通過在去噪迭代期間使用來自重疊三平面的特徵樣本來條件生成過程完成的。潛在三平面外推產生在語義和幾何上有意義的過渡,與現有場景和諧融合。使用2D佈局條件機制來控制場景元素的放置和排列。實驗結果表明,BlockFusion能夠生成多樣化、幾何一致且無限大的3D場景,無論是室內還是室外場景,形狀質量都具有前所未有的高質量。
English
We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.
PDF341December 15, 2024