TideGS:通過核外優化可擴展訓練超過十億個三維高斯潑濺基元
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization
May 19, 2026
作者: Chonghao Zhong, Linfeng Shi, Hua Chen, Tiecheng Sun, Hao Zhao, Binhang Yuan, Chaojian Li
cs.AI
摘要
訓練十億基元規模的3D高斯潑濺(3DGS)本質上面臨記憶體瓶頸:每個高斯基元攜帶一個大型屬性向量,且聚合參數表迅速超出GPU容量,導致先前的系統在單一消費級GPU上僅能處理數千萬個高斯基元。我們觀察到3DGS訓練本質上具有稀疏性且依賴於軌跡條件:每次迭代僅啟動當前相機批次中可見的高斯基元,因此GPU記憶體可作為工作集快取,而非持續性的參數儲存空間。基於此洞察,我們提出TideGS,一個外存訓練框架,透過三項協同技術在SSD-CPU-GPU層級間管理參數:符合SSD對齊空間區域性的區塊虛擬化幾何體、用於重疊I/O與計算的層級非同步管線,以及軌跡自適應差分串流傳輸——僅在迭代間傳輸增量工作集差異。實驗結果顯示,TideGS可在單張24 GB GPU上訓練超過十億個高斯基元,同時在大型場景中達成所有受評估的單GPU基線方法中最高的重建品質,其規模超越先前的外存基線(例如約1億個高斯基元)與標準記憶體內訓練(例如約1,100萬個高斯基元)。
English
Training 3D Gaussian Splatting (3DGS) at billion-primitive scale is fundamentally memory-bound: each Gaussian primitive carries a large attribute vector, and the aggregate parameter table quickly exceeds GPU capacity, limiting prior systems to tens of millions of Gaussians on commodity single-GPU hardware. We observe that 3DGS training is inherently sparse and trajectory-conditioned: each iteration activates only the Gaussians visible from the current camera batch, so GPU memory can serve as a working-set cache rather than a persistent parameter store. Building on this insight, we introduce TideGS, an out-of-core training framework that manages parameters across an SSD-CPU-GPU hierarchy via three synergistic techniques: block-virtualized geometry for SSD-aligned spatial locality, a hierarchical asynchronous pipeline to overlap I/O with computation, and trajectory-adaptive differential streaming that transfers only incremental working-set deltas between iterations. Experiments show that TideGS enables training with over one billion Gaussians on a single 24 GB GPU while achieving the best reconstruction quality among evaluated single-GPU baselines on large-scale scenes, scaling beyond prior out-of-core baselines (e.g., approximately 100M Gaussians) and standard in-memory training (e.g., approximately 11M Gaussians).