Gamba:將高斯濺射技術與 Mamba 結合,用於單視角 3D 重建。
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
March 27, 2024
作者: Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
cs.AI
摘要
我們面對從單張圖像高效重建3D資產的挑戰,這是自動化3D內容創建流程需求不斷增長的問題。先前的方法主要依賴於得分蒸餾取樣(SDS)和神經輻射場(NeRF)。儘管這些方法取得了顯著成功,但由於優化時間長且記憶使用量大,這些方法遇到了實際限制。在本報告中,我們介紹了Gamba,一種從單視角圖像重建3D的端到端攤銷模型,強調兩個主要見解:(1)3D表示:利用大量3D高斯函數進行高效的3D高斯飛灑過程;(2)骨幹設計:引入基於Mamba的順序網絡,促進依賴上下文的推理和與序列(令牌)長度的線性可擴展性,以容納大量高斯函數。Gamba融合了在數據預處理、正則化設計和訓練方法方面的重大進展。我們使用真實世界掃描的OmniObject3D數據集對Gamba進行了評估,與現有的基於優化和前向傳播的3D生成方法進行比較。在這裡,Gamba展示了競爭力強的生成能力,無論是在質量上還是在量化上,同時實現了卓越的速度,大約在單個NVIDIA A100 GPU上為0.6秒。
English
We tackle the challenge of efficiently reconstructing a 3D asset from a
single image with growing demands for automated 3D content creation pipelines.
Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural
Radiance Fields (NeRF). Despite their significant success, these approaches
encounter practical limitations due to lengthy optimization and considerable
memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D
reconstruction model from single-view images, emphasizing two main insights:
(1) 3D representation: leveraging a large number of 3D Gaussians for an
efficient 3D Gaussian splatting process; (2) Backbone design: introducing a
Mamba-based sequential network that facilitates context-dependent reasoning and
linear scalability with the sequence (token) length, accommodating a
substantial number of Gaussians. Gamba incorporates significant advancements in
data preprocessing, regularization design, and training methodologies. We
assessed Gamba against existing optimization-based and feed-forward 3D
generation approaches using the real-world scanned OmniObject3D dataset. Here,
Gamba demonstrates competitive generation capabilities, both qualitatively and
quantitatively, while achieving remarkable speed, approximately 0.6 second on a
single NVIDIA A100 GPU.Summary
AI-Generated Summary