VGGT-X:當VGGT遇上密集新視角合成
VGGT-X: When VGGT Meets Dense Novel View Synthesis
September 29, 2025
作者: Yang Liu, Chuanchen Luo, Zimo Tang, Junran Peng, Zhaoxiang Zhang
cs.AI
摘要
我們研究了將3D基礎模型(3DFMs)應用於密集新視角合成(NVS)的問題。儘管基於NeRF和3DGS的新視角合成取得了顯著進展,但當前方法仍依賴於從結構從運動(SfM)中獲取的準確3D屬性(如相機姿態和點雲),這在低紋理或低重疊的捕捉中往往緩慢且脆弱。最近的3DFMs展示了相較於傳統管線的數量級加速,並展現了在線NVS的巨大潛力。但大多數驗證和結論僅限於稀疏視角設置。我們的研究發現,將3DFMs簡單擴展到密集視角會遇到兩個基本障礙:顯著增加的VRAM負擔以及不完美的輸出,這些輸出會降低對初始化敏感的3D訓練質量。為解決這些障礙,我們引入了VGGT-X,包含一個可擴展至1000+圖像的內存高效VGGT實現、用於增強VGGT輸出的自適應全局對齊,以及穩健的3DGS訓練實踐。大量實驗表明,這些措施顯著縮小了與COLMAP初始化管線的保真度差距,在密集無COLMAP的NVS和姿態估計中達到了最先進的成果。此外,我們分析了與COLMAP初始化渲染之間剩餘差距的原因,為未來3D基礎模型和密集NVS的發展提供了見解。我們的項目頁面可在https://dekuliutesla.github.io/vggt-x.github.io/ 訪問。
English
We study the problem of applying 3D Foundation Models (3DFMs) to dense Novel
View Synthesis (NVS). Despite significant progress in Novel View Synthesis
powered by NeRF and 3DGS, current approaches remain reliant on accurate 3D
attributes (e.g., camera poses and point clouds) acquired from
Structure-from-Motion (SfM), which is often slow and fragile in low-texture or
low-overlap captures. Recent 3DFMs showcase orders of magnitude speedup over
the traditional pipeline and great potential for online NVS. But most of the
validation and conclusions are confined to sparse-view settings. Our study
reveals that naively scaling 3DFMs to dense views encounters two fundamental
barriers: dramatically increasing VRAM burden and imperfect outputs that
degrade initialization-sensitive 3D training. To address these barriers, we
introduce VGGT-X, incorporating a memory-efficient VGGT implementation that
scales to 1,000+ images, an adaptive global alignment for VGGT output
enhancement, and robust 3DGS training practices. Extensive experiments show
that these measures substantially close the fidelity gap with
COLMAP-initialized pipelines, achieving state-of-the-art results in dense
COLMAP-free NVS and pose estimation. Additionally, we analyze the causes of
remaining gaps with COLMAP-initialized rendering, providing insights for the
future development of 3D foundation models and dense NVS. Our project page is
available at https://dekuliutesla.github.io/vggt-x.github.io/