ChatPaper.aiChatPaper

GS-LRM:用於3D高斯點陣圖的大型重建模型

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

April 30, 2024
作者: Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu
cs.AI

摘要

我們提出了GS-LRM,一個可擴展的大型重建模型,能夠在單個A100 GPU上以0.23秒的速度從2-4個姿勢稀疏圖像中預測高質量的3D高斯基元。我們的模型採用非常簡單基於Transformer的架構;我們將輸入的姿勢圖像進行拼接,將多視圖圖像令牌通過一系列Transformer塊,並直接從這些令牌解碼最終的每像素高斯參數,以進行可微分渲染。與先前僅能重建物體的LRM不同,通過預測每像素的高斯分佈,GS-LRM自然地處理具有大範圍和複雜性變化的場景。我們展示了我們的模型可以通過分別在Objaverse和RealEstate10K上進行訓練來處理物體和場景捕獲。在兩種情況下,該模型均遠遠優於最先進的基準模型。我們還展示了我們的模型在下游3D生成任務中的應用。我們的項目網頁位於:https://sai-bi.github.io/project/gs-lrm/。
English
We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model in downstream 3D generation tasks. Our project webpage is available at: https://sai-bi.github.io/project/gs-lrm/ .

Summary

AI-Generated Summary

PDF211December 8, 2024