LGM:用于高分辨率3D内容创建的大型多视角高斯模型
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
February 7, 2024
作者: Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu
cs.AI
摘要
在质量和速度方面,3D内容创建取得了显著进展。尽管当前的前馈模型可以在几秒内生成3D对象,但其分辨率受到训练过程中需要的密集计算的限制。在本文中,我们介绍了大型多视角高斯模型(LGM),这是一个新颖的框架,旨在从文本提示或单视图图像生成高分辨率的3D模型。我们的关键见解有两个方面:1)3D表示:我们提出了多视角高斯特征作为一种高效而强大的表示,然后可以将其融合在一起进行可微渲染。2)3D骨干:我们提出了一种不对称U-Net作为一个高吞吐量的骨干,可在多视角图像上运行,这些图像可以通过利用多视角扩散模型从文本或单视图图像输入中生成。大量实验证明了我们方法的高保真度和高效性。值得注意的是,我们保持了在5秒内生成3D对象的快速速度,同时将训练分辨率提升至512,从而实现了高分辨率的3D内容生成。
English
3D content creation has achieved significant progress in terms of both
quality and speed. Although current feed-forward models can produce 3D objects
in seconds, their resolution is constrained by the intensive computation
required during training. In this paper, we introduce Large Multi-View Gaussian
Model (LGM), a novel framework designed to generate high-resolution 3D models
from text prompts or single-view images. Our key insights are two-fold: 1) 3D
Representation: We propose multi-view Gaussian features as an efficient yet
powerful representation, which can then be fused together for differentiable
rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput
backbone operating on multi-view images, which can be produced from text or
single-view image input by leveraging multi-view diffusion models. Extensive
experiments demonstrate the high fidelity and efficiency of our approach.
Notably, we maintain the fast speed to generate 3D objects within 5 seconds
while boosting the training resolution to 512, thereby achieving
high-resolution 3D content generation.