VolSplat:以體素對齊預測重新思考前饋式3D高斯潑濺
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction
September 23, 2025
作者: Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu, Donny Y. Chen, Bohan Zhuang
cs.AI
摘要
前馈式三维高斯泼溅(3DGS)已成为新视角合成中的一项高效解决方案。现有方法主要依赖于像素对齐的高斯预测范式,即每个二维像素被映射到一个三维高斯分布。我们重新审视了这一广泛采用的公式,并识别出若干固有局限:它使得重建的三维模型严重依赖输入视角的数量,导致视角偏差的密度分布,并在源视角存在遮挡或低纹理时引入对齐误差。为应对这些挑战,我们提出了VolSplat,一种新的多视角前馈范式,用体素对齐的高斯分布取代了像素对齐。通过直接从预测的三维体素网格中预测高斯分布,它克服了像素对齐对易出错的二维特征匹配的依赖,确保了鲁棒的多视角一致性。此外,它还能基于三维场景复杂度自适应控制高斯密度,生成更忠实的高斯点云,提升几何一致性,并增强新视角渲染质量。在包括RealEstate10K和ScanNet在内的广泛使用的基准测试中,实验表明VolSplat实现了最先进的性能,同时生成了更合理且视角一致的高斯重建。除了卓越的结果外,我们的方法建立了一个更具扩展性的前馈三维重建框架,提供了更密集且更稳健的表示,为更广泛社区中的进一步研究铺平了道路。视频结果、代码及训练模型可在我们的项目页面上获取:https://lhmd.top/volsplat。
English
Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective
solution for novel view synthesis. Existing methods predominantly rely on a
pixel-aligned Gaussian prediction paradigm, where each 2D pixel is mapped to a
3D Gaussian. We rethink this widely adopted formulation and identify several
inherent limitations: it renders the reconstructed 3D models heavily dependent
on the number of input views, leads to view-biased density distributions, and
introduces alignment errors, particularly when source views contain occlusions
or low texture. To address these challenges, we introduce VolSplat, a new
multi-view feed-forward paradigm that replaces pixel alignment with
voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D
voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature
matching, ensuring robust multi-view consistency. Furthermore, it enables
adaptive control over Gaussian density based on 3D scene complexity, yielding
more faithful Gaussian point clouds, improved geometric consistency, and
enhanced novel-view rendering quality. Experiments on widely used benchmarks
including RealEstate10K and ScanNet demonstrate that VolSplat achieves
state-of-the-art performance while producing more plausible and view-consistent
Gaussian reconstructions. In addition to superior results, our approach
establishes a more scalable framework for feed-forward 3D reconstruction with
denser and more robust representations, paving the way for further research in
wider communities. The video results, code and trained models are available on
our project page: https://lhmd.top/volsplat.