ChatPaper.aiChatPaper

鲁棒的双高斯点渲染技术用于沉浸式以人为中心的体积视频

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

September 12, 2024
作者: Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu
cs.AI

摘要

体积视频代表了视觉媒体中的一项革命性进展,使用户能够自由浏览沉浸式虚拟体验,并缩小数字世界与现实世界之间的差距。然而,现有工作流程中对网格序列进行稳定处理和生成过大资产所需的大量手动干预阻碍了更广泛的采用。在本文中,我们提出了一种名为DualGS的新型基于高斯的方法,用于实时和高保真度地播放复杂人类表现,具有出色的压缩比。DualGS的关键思想是使用相应的皮肤和关节高斯分别表示运动和外观。这种明确的解耦可以显著减少运动冗余并增强时间上的连贯性。我们首先初始化DualGS,并将皮肤高斯锚定到第一帧的关节高斯。随后,我们采用一种逐帧人类表现建模的粗到细训练策略。它包括一个用于整体运动预测的粗对齐阶段,以及一个用于稳健跟踪和高保真度渲染的细粒度优化。为了将体积视频无缝集成到虚拟现实环境中,我们使用熵编码高效压缩运动,并使用编解码器压缩外观,同时结合一个持久的码书。我们的方法实现了高达120倍的压缩比,每帧仅需要约350KB的存储空间。我们通过在虚拟现实头显上进行逼真的自由视角体验,展示了我们的表示方法的有效性,使用户可以沉浸观看表演者的演奏,并感受到演奏者指尖的节奏。
English
Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed DualGS, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.

Summary

AI-Generated Summary

PDF134November 16, 2024