Vidu4D：使用动态高斯Surfels将单个生成的视频重建为高保真度4D模型

摘要

视频生成模型因其生成逼真且富有想象力的帧而受到特别关注。此外，这些模型还被观察到表现出强大的三维一致性，显著增强了它们作为世界模拟器的潜力。在这项工作中，我们提出了Vidu4D，这是一种在准确重建4D（即连续3D）表示方面表现出色的重建模型，解决了与非刚性和帧失真相关的挑战。这种能力对于创建保持空间和时间连贯性的高保真虚拟内容至关重要。Vidu4D的核心是我们提出的动态高斯曲面元（DGS）技术。DGS优化了时变的变形函数，将高斯曲面元（表面元素）从静态状态转换为动态变形状态。这种转换实现了对时间内运动和变形的精确描述。为了保持与表面对齐的高斯曲面元的结构完整性，我们设计了基于连续变形场的变形状态几何正则化，用于估计法线。此外，我们学习了高斯曲面元的旋转和缩放参数的改进，极大地减轻了在变形过程中的纹理闪烁，并增强了对细粒度外观细节的捕捉。Vidu4D还包含一种新颖的初始化状态，为DGS中的变形场提供了适当的起点。将现有视频生成模型与Vidu4D配备，整体框架展示了在外观和几何上实现高保真文本到4D生成的能力。

English

Video generative models are receiving particular attention given their ability to generate realistic and imaginative frames. Besides, these models are also observed to exhibit strong 3D consistency, significantly enhancing their potential to act as world simulators. In this work, we present Vidu4D, a novel reconstruction model that excels in accurately reconstructing 4D (i.e., sequential 3D) representations from single generated videos, addressing challenges associated with non-rigidity and frame distortion. This capability is pivotal for creating high-fidelity virtual contents that maintain both spatial and temporal coherence. At the core of Vidu4D is our proposed Dynamic Gaussian Surfels (DGS) technique. DGS optimizes time-varying warping functions to transform Gaussian surfels (surface elements) from a static state to a dynamically warped state. This transformation enables a precise depiction of motion and deformation over time. To preserve the structural integrity of surface-aligned Gaussian surfels, we design the warped-state geometric regularization based on continuous warping fields for estimating normals. Additionally, we learn refinements on rotation and scaling parameters of Gaussian surfels, which greatly alleviates texture flickering during the warping process and enhances the capture of fine-grained appearance details. Vidu4D also contains a novel initialization state that provides a proper start for the warping fields in DGS. Equipping Vidu4D with an existing video generative model, the overall framework demonstrates high-fidelity text-to-4D generation in both appearance and geometry.

Vidu4D：使用动态高斯Surfels将单个生成的视频重建为高保真度4D模型

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

摘要

Support