GlobalSplat：基於全局場景標記的高效前饋式3D高斯潑濺技術

摘要

基元的高效空間分配是3D高斯潑濺技術的基礎，它直接決定了表徵緊湊性、重建速度與渲染保真度之間的協同效應。現有解決方案無論基於迭代優化或前饋推理，皆因依賴缺乏全域場景感知的局部啟發式分配策略，而在這些目標間面臨顯著取捨。具體而言，當前前饋方法大多採用像素對齊或體素對齊方式，透過將像素反投影為密集的視角對齊基元，導致3D資產內建冗餘。隨著輸入視角增加，表徵規模會持續膨脹，全域一致性亦變得脆弱。為此，我們提出GlobalSplat框架，其核心設計理念為「先對齊，後解碼」。本方法學習一種緊湊的全域潛在場景表徵，在解碼任何顯式3D幾何前，先對多視角輸入進行編碼並解析跨視角對應關係。關鍵在於，此架構無需依賴預訓練的像素預測主幹網絡或重用密集基線的潛在特徵，即可實現緊湊且全域一致的重建。透過採用從粗到精的訓練課程，逐步提升解碼容量，GlobalSplat從根本上防止表徵膨脹。在RealEstate10K與ACID數據集上，我們的模型僅需使用1.6萬個高斯基元（遠少於密集流程的需求），即可達成具競爭力的新視角合成效果，存儲佔用僅4MB。此外，GlobalSplat的推理速度顯著快於基線方法，單次前向傳播耗時低於78毫秒。項目頁面詳見：https://r-itk.github.io/globalsplat/

English

The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inference, suffer from significant trade-offs between these goals, mainly due to the reliance on local, heuristic-driven allocation strategies that lack global scene awareness. Specifically, current feed-forward methods are largely pixel-aligned or voxel-aligned. By unprojecting pixels into dense, view-aligned primitives, they bake redundancy into the 3D asset. As more input views are added, the representation size increases and global consistency becomes fragile. To this end, we introduce GlobalSplat, a framework built on the principle of align first, decode later. Our approach learns a compact, global, latent scene representation that encodes multi-view input and resolves cross-view correspondences before decoding any explicit 3D geometry. Crucially, this formulation enables compact, globally consistent reconstructions without relying on pretrained pixel-prediction backbones or reusing latent features from dense baselines. Utilizing a coarse-to-fine training curriculum that gradually increases decoded capacity, GlobalSplat natively prevents representation bloat. On RealEstate10K and ACID, our model achieves competitive novel-view synthesis performance while utilizing as few as 16K Gaussians, significantly less than required by dense pipelines, obtaining a light 4MB footprint. Further, GlobalSplat enables significantly faster inference than the baselines, operating under 78 milliseconds in a single forward pass. Project page is available at https://r-itk.github.io/globalsplat/

GlobalSplat：基於全局場景標記的高效前饋式3D高斯潑濺技術

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

摘要

Support