VolSplat: 복셀 정렬 예측을 통해 재고찰한 피드포워드 3D 가우시안 스플래팅

초록

피드포워드 3D 가우시안 스플래팅(3DGS)은 새로운 시점 합성(new view synthesis)을 위한 매우 효과적인 솔루션으로 부상했습니다. 기존 방법들은 주로 픽셀 정렬 가우시안 예측 패러다임에 의존하며, 여기서 각 2D 픽셀은 3D 가우시안에 매핑됩니다. 우리는 이 널리 채택된 공식을 재고하고 몇 가지 내재된 한계를 확인했습니다: 이는 재구성된 3D 모델이 입력 뷰의 수에 크게 의존하게 만들고, 뷰 편향된 밀도 분포를 초래하며, 특히 소스 뷰에 가림 현상이나 낮은 텍스처가 포함된 경우 정렬 오류를 유발합니다. 이러한 문제를 해결하기 위해, 우리는 픽셀 정렬을 복셀 정렬 가우시안으로 대체하는 새로운 멀티뷰 피드포워드 패러다임인 VolSplat을 소개합니다. 예측된 3D 복셀 그리드에서 직접 가우시안을 예측함으로써, 이는 오류가 발생하기 쉬운 2D 특징 매칭에 대한 픽셀 정렬의 의존성을 극복하고, 견고한 멀티뷰 일관성을 보장합니다. 더 나아가, 이는 3D 장면 복잡도에 기반한 가우시안 밀도의 적응적 제어를 가능하게 하여, 더 충실한 가우시안 포인트 클라우드, 개선된 기하학적 일관성, 그리고 향상된 새로운 시점 렌더링 품질을 제공합니다. RealEstate10K 및 ScanNet과 같은 널리 사용되는 벤치마크에서의 실험은 VolSplat이 최첨단 성능을 달성하면서 더 그럴듯하고 뷰 일관적인 가우시안 재구성을 생성함을 보여줍니다. 우수한 결과 외에도, 우리의 접근 방식은 더 밀도 높고 견고한 표현을 갖춘 피드포워드 3D 재구성을 위한 더 확장 가능한 프레임워크를 구축하여, 더 넓은 커뮤니티에서의 추가 연구를 위한 길을 열어줍니다. 비디오 결과, 코드 및 훈련된 모델은 우리 프로젝트 페이지에서 확인할 수 있습니다: https://lhmd.top/volsplat.

English

Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a pixel-aligned Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over Gaussian density based on 3D scene complexity, yielding more faithful Gaussian point clouds, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks including RealEstate10K and ScanNet demonstrate that VolSplat achieves state-of-the-art performance while producing more plausible and view-consistent Gaussian reconstructions. In addition to superior results, our approach establishes a more scalable framework for feed-forward 3D reconstruction with denser and more robust representations, paving the way for further research in wider communities. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.

VolSplat: 복셀 정렬 예측을 통해 재고찰한 피드포워드 3D 가우시안 스플래팅

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

초록

Support