TideGS：通過核外優化可擴展訓練超過十億個三維高斯潑濺基元

摘要

訓練十億基元規模的3D高斯潑濺（3DGS）本質上面臨記憶體瓶頸：每個高斯基元攜帶一個大型屬性向量，且聚合參數表迅速超出GPU容量，導致先前的系統在單一消費級GPU上僅能處理數千萬個高斯基元。我們觀察到3DGS訓練本質上具有稀疏性且依賴於軌跡條件：每次迭代僅啟動當前相機批次中可見的高斯基元，因此GPU記憶體可作為工作集快取，而非持續性的參數儲存空間。基於此洞察，我們提出TideGS，一個外存訓練框架，透過三項協同技術在SSD-CPU-GPU層級間管理參數：符合SSD對齊空間區域性的區塊虛擬化幾何體、用於重疊I/O與計算的層級非同步管線，以及軌跡自適應差分串流傳輸——僅在迭代間傳輸增量工作集差異。實驗結果顯示，TideGS可在單張24 GB GPU上訓練超過十億個高斯基元，同時在大型場景中達成所有受評估的單GPU基線方法中最高的重建品質，其規模超越先前的外存基線（例如約1億個高斯基元）與標準記憶體內訓練（例如約1,100萬個高斯基元）。

English

Training 3D Gaussian Splatting (3DGS) at billion-primitive scale is fundamentally memory-bound: each Gaussian primitive carries a large attribute vector, and the aggregate parameter table quickly exceeds GPU capacity, limiting prior systems to tens of millions of Gaussians on commodity single-GPU hardware. We observe that 3DGS training is inherently sparse and trajectory-conditioned: each iteration activates only the Gaussians visible from the current camera batch, so GPU memory can serve as a working-set cache rather than a persistent parameter store. Building on this insight, we introduce TideGS, an out-of-core training framework that manages parameters across an SSD-CPU-GPU hierarchy via three synergistic techniques: block-virtualized geometry for SSD-aligned spatial locality, a hierarchical asynchronous pipeline to overlap I/O with computation, and trajectory-adaptive differential streaming that transfers only incremental working-set deltas between iterations. Experiments show that TideGS enables training with over one billion Gaussians on a single 24 GB GPU while achieving the best reconstruction quality among evaluated single-GPU baselines on large-scale scenes, scaling beyond prior out-of-core baselines (e.g., approximately 100M Gaussians) and standard in-memory training (e.g., approximately 11M Gaussians).