长LRM:用于广覆盖高斯斑点的长序列大重建模型

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

October 16, 2024
作者: Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu
cs.AI

摘要

我们提出了Long-LRM,这是一个通用的3D高斯重建模型,能够从一长序列的输入图像中重建出一个大场景。具体来说,我们的模型可以在单个A100 80G GPU上在1.3秒内处理分辨率为960x540的32个源图像。我们的架构采用了最近的Mamba2块和经典的Transformer块的混合,使得可以处理比以往更多的标记,通过高效的标记合并和高斯修剪步骤来在质量和效率之间取得平衡。与之前只能处理1~4个输入图像且只能重建大场景的一小部分的前馈模型不同,Long-LRM可以在单个前馈步骤中重建整个场景。在诸如DL3DV-140和Tanks and Temples等大规模场景数据集上,我们的方法在效率上比基于优化的方法高出两个数量级,并且在性能上可与其媲美。项目页面:https://arthurhero.github.io/projects/llrm
English
We propose Long-LRM, a generalizable 3D Gaussian reconstruction model that is capable of reconstructing a large scene from a long sequence of input images. Specifically, our model can process 32 source images at 960x540 resolution within only 1.3 seconds on a single A100 80G GPU. Our architecture features a mixture of the recent Mamba2 blocks and the classical transformer blocks which allowed many more tokens to be processed than prior work, enhanced by efficient token merging and Gaussian pruning steps that balance between quality and efficiency. Unlike previous feed-forward models that are limited to processing 1~4 input images and can only reconstruct a small portion of a large scene, Long-LRM reconstructs the entire scene in a single feed-forward step. On large-scale scene datasets such as DL3DV-140 and Tanks and Temples, our method achieves performance comparable to optimization-based approaches while being two orders of magnitude more efficient. Project page: https://arthurhero.github.io/projects/llrm

Summary

AI-Generated Summary

PDF62November 16, 2024