GlobalSplat：基于全局场景令牌的高效前馈3D高斯溅射

摘要

高效的空间基元分配是三维高斯泼溅技术的基础，它直接决定了表示紧凑性、重建速度与渲染保真度之间的协同关系。现有解决方案无论基于迭代优化还是前向推理，都因依赖缺乏全局场景认知的局部启发式分配策略，而难以兼顾这些目标。具体而言，当前前向方法大多采用像素对齐或体素对齐策略，通过将像素反投影为密集的视角对齐基元，导致三维资产中存在冗余。随着输入视角的增加，表示规模会持续膨胀，全局一致性也变得脆弱。为此，我们提出GlobalSplat框架，其核心设计理念是"先对齐，后解码"。该方法通过学得的紧凑全局潜在场景表示，在解码任何显式三维几何前即可编码多视角输入并解析跨视角对应关系。关键创新在于，这种范式无需依赖预训练的像素预测主干网络或复用稠密基线的潜在特征，就能实现紧凑且全局一致的重建。通过采用由粗到精的训练策略逐步提升解码能力，GlobalSplat从原理上避免了表示膨胀。在RealEstate10K和ACID数据集上，我们的模型仅需1.6万个高斯基元即可实现具有竞争力的新视角合成效果，显著少于稠密流水线所需数量，最终获得仅4MB的轻量化存储占用。此外，GlobalSplat的推理速度显著优于基线方法，单次前向传播耗时低于78毫秒。项目页面详见：https://r-itk.github.io/globalsplat/

English

The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inference, suffer from significant trade-offs between these goals, mainly due to the reliance on local, heuristic-driven allocation strategies that lack global scene awareness. Specifically, current feed-forward methods are largely pixel-aligned or voxel-aligned. By unprojecting pixels into dense, view-aligned primitives, they bake redundancy into the 3D asset. As more input views are added, the representation size increases and global consistency becomes fragile. To this end, we introduce GlobalSplat, a framework built on the principle of align first, decode later. Our approach learns a compact, global, latent scene representation that encodes multi-view input and resolves cross-view correspondences before decoding any explicit 3D geometry. Crucially, this formulation enables compact, globally consistent reconstructions without relying on pretrained pixel-prediction backbones or reusing latent features from dense baselines. Utilizing a coarse-to-fine training curriculum that gradually increases decoded capacity, GlobalSplat natively prevents representation bloat. On RealEstate10K and ACID, our model achieves competitive novel-view synthesis performance while utilizing as few as 16K Gaussians, significantly less than required by dense pipelines, obtaining a light 4MB footprint. Further, GlobalSplat enables significantly faster inference than the baselines, operating under 78 milliseconds in a single forward pass. Project page is available at https://r-itk.github.io/globalsplat/

GlobalSplat：基于全局场景令牌的高效前馈3D高斯溅射

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

摘要

Support