LGM:用於高解析度3D內容創建的大型多視圖高斯模型
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
February 7, 2024
作者: Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu
cs.AI
摘要
在品質和速度方面,3D 內容創建已取得顯著進展。儘管目前的前饋模型可以在幾秒內生成 3D 物體,但其解析度受到訓練過程中所需的密集計算的限制。本文介紹了大型多視角高斯模型(LGM),這是一個新穎的框架,旨在從文本提示或單視圖圖像生成高解析度的 3D 模型。我們的關鍵見解有兩個方面:1)3D 表示:我們提出多視角高斯特徵作為一種高效但強大的表示,然後可以將其融合在一起進行可微渲染。2)3D 主幹:我們提出了一種非對稱 U-Net 作為高通量主幹,可在多視圖圖像上運行,這些圖像可以通過利用多視圖擴散模型從文本或單視圖圖像輸入中生成。大量實驗證明了我們方法的高保真度和效率。值得注意的是,我們保持了在 5 秒內生成 3D 物體的快速速度,同時將訓練解析度提升至 512,從而實現了高解析度的 3D 內容生成。
English
3D content creation has achieved significant progress in terms of both
quality and speed. Although current feed-forward models can produce 3D objects
in seconds, their resolution is constrained by the intensive computation
required during training. In this paper, we introduce Large Multi-View Gaussian
Model (LGM), a novel framework designed to generate high-resolution 3D models
from text prompts or single-view images. Our key insights are two-fold: 1) 3D
Representation: We propose multi-view Gaussian features as an efficient yet
powerful representation, which can then be fused together for differentiable
rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput
backbone operating on multi-view images, which can be produced from text or
single-view image input by leveraging multi-view diffusion models. Extensive
experiments demonstrate the high fidelity and efficiency of our approach.
Notably, we maintain the fast speed to generate 3D objects within 5 seconds
while boosting the training resolution to 512, thereby achieving
high-resolution 3D content generation.