ChatPaper.aiChatPaper

VolSplat:基于体素对齐预测重构前馈式3D高斯溅射

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

September 23, 2025
作者: Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu, Donny Y. Chen, Bohan Zhuang
cs.AI

摘要

前馈式三维高斯泼溅(3DGS)已成为新视角合成的高效解决方案。现有方法主要依赖于像素对齐的高斯预测范式,即每个二维像素映射到一个三维高斯分布。我们重新审视了这一广泛采用的公式,并识别出几个固有局限:它使得重建的三维模型严重依赖输入视角的数量,导致视角偏向的密度分布,并在源视角存在遮挡或低纹理时引入对齐误差。为应对这些挑战,我们提出了VolSplat,一种新的多视角前馈范式,用体素对齐的高斯分布取代了像素对齐。通过直接从预测的三维体素网格中预测高斯分布,它克服了像素对齐对易出错的二维特征匹配的依赖,确保了鲁棒的多视角一致性。此外,它还能基于三维场景复杂度自适应控制高斯密度,生成更忠实的高斯点云,提升几何一致性,并增强新视角渲染质量。在RealEstate10K和ScanNet等广泛使用的基准测试上的实验表明,VolSplat实现了最先进的性能,同时生成了更逼真且视角一致的高斯重建。除了卓越的结果外,我们的方法建立了一个更具扩展性的前馈三维重建框架,提供了更密集、更鲁棒的表征,为更广泛社区的研究铺平了道路。视频结果、代码及训练模型可在我们的项目页面获取:https://lhmd.top/volsplat。
English
Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a pixel-aligned Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over Gaussian density based on 3D scene complexity, yielding more faithful Gaussian point clouds, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks including RealEstate10K and ScanNet demonstrate that VolSplat achieves state-of-the-art performance while producing more plausible and view-consistent Gaussian reconstructions. In addition to superior results, our approach establishes a more scalable framework for feed-forward 3D reconstruction with denser and more robust representations, paving the way for further research in wider communities. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.
PDF234September 24, 2025