VidSplat：基於幾何引導視頻擴散先驗的高斯噴濺重建

摘要

高斯潑濺（Gaussian Splatting）在多視角表面重建方面取得了顯著進展，然而當僅有少數視角可用時，其性能會明顯退化。儘管近期研究通過增強多視角一致性以生成合理表面來緩解此問題，但此類方法仍難以推斷超出輸入範圍之外的未見、被遮擋或弱約束區域。為解決此限制，我們提出VidSplat——一個免訓練的生成式重建框架，該框架利用強大的視頻擴散先驗，迭代地合成新視角以補償缺失的輸入覆蓋範圍，從而從稀疏輸入中恢復完整的三維場景。具體而言，我們解決了實現生成與重建有效整合的兩個關鍵挑戰。首先，為了實現三維一致生成，我們設計了一個免訓練的分階段去噪策略，該策略利用渲染的RGB圖像和遮罩圖像，自適應地引導去噪方向朝向底層幾何結構。其次，為增強重建效果，我們開發了一種迭代機制，該機制取樣相機軌跡、探索未觀測區域、合成新視角，並通過置信加權精煉來補充訓練。VidSplat對稀疏輸入甚至單張圖像均表現出穩健性。在廣泛使用的基準測試上的大量實驗證明了我們在稀疏視角場景重建中的優越性能。

English

Gaussian Splatting has achieved remarkable progress in multi-view surface reconstruction, yet it exhibits notable degradation when only few views are available. Although recent efforts alleviate this issue by enhancing multi-view consistency to produce plausible surfaces, they struggle to infer unseen, occluded, or weakly constrained regions beyond the input coverage. To address this limitation, we present VidSplat, a training-free generative reconstruction framework that leverages powerful video diffusion priors to iteratively synthesize novel views that compensate for missing input coverage, and thereby recover complete 3D scenes from sparse inputs. Specifically, we tackle two key challenges that enable the effective integration of generation and reconstruction. First, for 3D consistent generation, we elaborate a training-free, stage-wise denoising strategy that adaptively guides the denoising direction toward the underlying geometry using the rendered RGB and mask images. Second, to enhance the reconstruction, we develop an iterative mechanism that samples camera trajectories, explores unobserved regions, synthesizes novel views, and supplements training through confidence weighted refinement. VidSplat performs robustly to sparse input and even a single image. Extensive experiments on widely used benchmarks demonstrate our superior performance in sparse-view scene reconstruction.

VidSplat：基於幾何引導視頻擴散先驗的高斯噴濺重建

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

摘要

Support