GS-LRM:用于3D高斯点云投影的大型重建模型
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
April 30, 2024
作者: Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu
cs.AI
摘要
我们提出了GS-LRM,这是一个可扩展的大型重建模型,可以在单个A100 GPU上从2-4个姿态稀疏图像中以0.23秒的速度预测高质量的3D高斯基元。我们的模型采用了一个非常简单的基于Transformer的架构;我们对输入的姿态图像进行分块处理,将串联的多视图图像标记通过一系列Transformer块,直接从这些标记解码最终的每像素高斯参数,以进行可微分渲染。与先前只能重建对象的LRM不同,通过预测每像素的高斯函数,GS-LRM自然地处理具有大尺度和复杂性变化的场景。我们展示了我们的模型可以通过分别在Objaverse和RealEstate10K上进行训练来处理对象和场景捕获。在这两种情况下,我们的模型都大幅超越了最先进的基准线。我们还展示了我们的模型在下游3D生成任务中的应用。我们的项目网页链接为:https://sai-bi.github.io/project/gs-lrm/。
English
We propose GS-LRM, a scalable large reconstruction model that can predict
high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23
seconds on single A100 GPU. Our model features a very simple transformer-based
architecture; we patchify input posed images, pass the concatenated multi-view
image tokens through a sequence of transformer blocks, and decode final
per-pixel Gaussian parameters directly from these tokens for differentiable
rendering. In contrast to previous LRMs that can only reconstruct objects, by
predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large
variations in scale and complexity. We show that our model can work on both
object and scene captures by training it on Objaverse and RealEstate10K
respectively. In both scenarios, the models outperform state-of-the-art
baselines by a wide margin. We also demonstrate applications of our model in
downstream 3D generation tasks. Our project webpage is available at:
https://sai-bi.github.io/project/gs-lrm/ .Summary
AI-Generated Summary