VidSplat：基于几何引导视频扩散先验的高斯泼溅重建

摘要

高斯溅射在多视图表面重建中取得了显著进展，但在仅有少数视图可用时，其性能明显下降。尽管近期研究通过增强多视图一致性来生成合理的表面以缓解这一问题，但它们在推断输入覆盖范围之外的未知、遮挡或弱约束区域时仍存在困难。为解决这一局限，我们提出VidSplat——一种无需训练的生成式重建框架，该框架利用强大的视频扩散先验，通过迭代合成新视角以补偿缺失的输入覆盖，从而从稀疏输入中恢复完整的3D场景。具体而言，我们攻克了两个关键挑战以实现生成与重建的有效整合。首先，为实现3D一致生成，我们设计了一种无需训练的分阶段去噪策略，利用渲染的RGB图像和掩码图像自适应地将去噪方向引导至底层几何结构。其次，为增强重建效果，我们开发了一种迭代机制，该机制采样相机轨迹、探索未观测区域、合成新视角，并通过置信度加权精化补充训练。VidSplat对稀疏输入甚至单张图像均表现出鲁棒性。在广泛使用的基准测试上的大量实验表明，我们在稀疏视图场景重建中取得了卓越性能。

English

Gaussian Splatting has achieved remarkable progress in multi-view surface reconstruction, yet it exhibits notable degradation when only few views are available. Although recent efforts alleviate this issue by enhancing multi-view consistency to produce plausible surfaces, they struggle to infer unseen, occluded, or weakly constrained regions beyond the input coverage. To address this limitation, we present VidSplat, a training-free generative reconstruction framework that leverages powerful video diffusion priors to iteratively synthesize novel views that compensate for missing input coverage, and thereby recover complete 3D scenes from sparse inputs. Specifically, we tackle two key challenges that enable the effective integration of generation and reconstruction. First, for 3D consistent generation, we elaborate a training-free, stage-wise denoising strategy that adaptively guides the denoising direction toward the underlying geometry using the rendered RGB and mask images. Second, to enhance the reconstruction, we develop an iterative mechanism that samples camera trajectories, explores unobserved regions, synthesizes novel views, and supplements training through confidence weighted refinement. VidSplat performs robustly to sparse input and even a single image. Extensive experiments on widely used benchmarks demonstrate our superior performance in sparse-view scene reconstruction.

VidSplat：基于几何引导视频扩散先验的高斯泼溅重建

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

摘要

Support