VolSplat: ボクセル整列予測によるフィードフォワード3Dガウススプラッティングの再考

要旨

フィードフォワード型3Dガウススプラッティング（3DGS）は、新視点合成において非常に効果的なソリューションとして登場しました。既存の手法は主に、各2Dピクセルを3Dガウシアンにマッピングするピクセルアラインメント型ガウシアン予測パラダイムに依存しています。私たちはこの広く採用されている定式化を再考し、いくつかの内在的な制限を特定しました。それは、再構築された3Dモデルが入力ビューの数に大きく依存すること、ビューに偏った密度分布をもたらすこと、特にソースビューにオクルージョンや低テクスチャが含まれる場合にアラインメントエラーを引き起こすことです。これらの課題に対処するため、私たちはVolSplatを導入しました。これは、ピクセルアラインメントをボクセルアラインメント型ガウシアンに置き換える新しいマルチビューフィードフォワードパラダイムです。予測された3Dボクセルグリッドから直接ガウシアンを予測することで、エラーが発生しやすい2D特徴マッチングに依存するピクセルアラインメントの欠点を克服し、堅牢なマルチビュー一貫性を確保します。さらに、3Dシーンの複雑さに基づいてガウシアン密度を適応的に制御することが可能となり、より忠実なガウシアンポイントクラウド、改善された幾何学的整合性、および強化された新視点レンダリング品質が得られます。RealEstate10KやScanNetなどの広く使用されているベンチマークでの実験により、VolSplatが最先端の性能を達成し、より妥当でビュー整合性の高いガウシアン再構築を生成することが実証されました。優れた結果に加えて、私たちのアプローチは、より密で堅牢な表現を伴うフィードフォワード型3D再構築のためのスケーラブルなフレームワークを確立し、より広いコミュニティでのさらなる研究の道を開きます。ビデオ結果、コード、および学習済みモデルは、プロジェクトページ（https://lhmd.top/volsplat）で公開されています。

English

Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a pixel-aligned Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over Gaussian density based on 3D scene complexity, yielding more faithful Gaussian point clouds, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks including RealEstate10K and ScanNet demonstrate that VolSplat achieves state-of-the-art performance while producing more plausible and view-consistent Gaussian reconstructions. In addition to superior results, our approach establishes a more scalable framework for feed-forward 3D reconstruction with denser and more robust representations, paving the way for further research in wider communities. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.

VolSplat: ボクセル整列予測によるフィードフォワード3Dガウススプラッティングの再考

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

要旨

Support