VGGT-X:当VGGT邂逅密集新视角合成
VGGT-X: When VGGT Meets Dense Novel View Synthesis
September 29, 2025
作者: Yang Liu, Chuanchen Luo, Zimo Tang, Junran Peng, Zhaoxiang Zhang
cs.AI
摘要
我们研究了将三维基础模型(3DFMs)应用于密集新视角合成(NVS)的问题。尽管NeRF和3DGS在新视角合成方面取得了显著进展,现有方法仍依赖于通过运动结构恢复(SfM)获取的精确三维属性(如相机姿态和点云),而SfM在低纹理或低重叠捕获中往往速度慢且不稳定。最近的3DFMs展示了相对于传统流程数量级的速度提升,以及在线NVS的巨大潜力。但大多数验证和结论仅限于稀疏视图设置。我们的研究发现,将3DFMs简单扩展至密集视图会遭遇两个根本性障碍:显存负担急剧增加,以及不完美的输出会损害对初始化敏感的三维训练。为解决这些障碍,我们提出了VGGT-X,包含一个可扩展至1000+图像的内存高效VGGT实现、用于增强VGGT输出的自适应全局对齐,以及稳健的3DGS训练实践。大量实验表明,这些措施显著缩小了与基于COLMAP初始化流程的保真度差距,在无需COLMAP的密集NVS和姿态估计中达到了最先进的成果。此外,我们分析了与COLMAP初始化渲染之间剩余差距的原因,为未来三维基础模型和密集NVS的发展提供了洞见。我们的项目页面位于https://dekuliutesla.github.io/vggt-x.github.io/。
English
We study the problem of applying 3D Foundation Models (3DFMs) to dense Novel
View Synthesis (NVS). Despite significant progress in Novel View Synthesis
powered by NeRF and 3DGS, current approaches remain reliant on accurate 3D
attributes (e.g., camera poses and point clouds) acquired from
Structure-from-Motion (SfM), which is often slow and fragile in low-texture or
low-overlap captures. Recent 3DFMs showcase orders of magnitude speedup over
the traditional pipeline and great potential for online NVS. But most of the
validation and conclusions are confined to sparse-view settings. Our study
reveals that naively scaling 3DFMs to dense views encounters two fundamental
barriers: dramatically increasing VRAM burden and imperfect outputs that
degrade initialization-sensitive 3D training. To address these barriers, we
introduce VGGT-X, incorporating a memory-efficient VGGT implementation that
scales to 1,000+ images, an adaptive global alignment for VGGT output
enhancement, and robust 3DGS training practices. Extensive experiments show
that these measures substantially close the fidelity gap with
COLMAP-initialized pipelines, achieving state-of-the-art results in dense
COLMAP-free NVS and pose estimation. Additionally, we analyze the causes of
remaining gaps with COLMAP-initialized rendering, providing insights for the
future development of 3D foundation models and dense NVS. Our project page is
available at https://dekuliutesla.github.io/vggt-x.github.io/