3DGStream:即時訓練3D高斯模型以實現高效串流逼真自由視角影片
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
March 3, 2024
作者: Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing
cs.AI
摘要
從多視角影片中構建動態場景的照片逼真自由視點視頻(FVV)仍然是一項具有挑戰性的工作。儘管當前神經渲染技術取得了顯著進展,這些方法通常需要完整的視頻序列進行離線訓練,並且無法進行實時渲染。為了應對這些限制,我們引入了3DGStream,這是一種專為實現高效的現實世界動態場景FVV串流而設計的方法。我們的方法實現了在12秒內快速的即時逐幀重建,以及每秒200幀的實時渲染。具體來說,我們利用3D高斯(3DGs)來表示場景。我們不採用直接優化每幀3DGs的天真方法,而是使用緊湊的神經轉換緩存(NTC)來模擬3DGs的平移和旋轉,顯著減少了每個FVV幀所需的訓練時間和存儲空間。此外,我們提出了一種適應性3DG添加策略來處理動態場景中出現的新對象。實驗表明,與最先進的方法相比,3DGStream在渲染速度、圖像質量、訓練時間和模型存儲方面實現了競爭性的性能。
English
Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes
from multi-view videos remains a challenging endeavor. Despite the remarkable
advancements achieved by current neural rendering techniques, these methods
generally require complete video sequences for offline training and are not
capable of real-time rendering. To address these constraints, we introduce
3DGStream, a method designed for efficient FVV streaming of real-world dynamic
scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12
seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D
Gaussians (3DGs) to represent the scene. Instead of the na\"ive approach of
directly optimizing 3DGs per-frame, we employ a compact Neural Transformation
Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing
the training time and storage required for each FVV frame. Furthermore, we
propose an adaptive 3DG addition strategy to handle emerging objects in dynamic
scenes. Experiments demonstrate that 3DGStream achieves competitive performance
in terms of rendering speed, image quality, training time, and model storage
when compared with state-of-the-art methods.