ShapeGen4D：迈向基于视频的高质量4D形状生成

摘要

视频条件下的4D形状生成旨在直接从输入视频中恢复随时间变化的3D几何结构和视角一致的外观。在本研究中，我们提出了一种原生的视频到4D形状生成框架，该框架能够端到端地从视频中合成单一的动态3D表示。我们的框架基于大规模预训练的3D模型，引入了三个关键组件：(i) 一种时间注意力机制，它在生成过程中考虑所有帧，同时产生时间索引的动态表示；(ii) 一种时间感知的点采样和4D潜在锚定方法，以促进时间上一致的几何和纹理；(iii) 跨帧的噪声共享，以增强时间稳定性。我们的方法无需逐帧优化，便能准确捕捉非刚性运动、体积变化乃至拓扑转变。在多样化的真实世界视频中，与基线方法相比，我们的方法提升了鲁棒性和感知保真度，并减少了失败模式。

English

Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our framework introduces three key components based on large-scale pre-trained 3D models: (i) a temporal attention that conditions generation on all frames while producing a time-indexed dynamic representation; (ii) a time-aware point sampling and 4D latent anchoring that promote temporally consistent geometry and texture; and (iii) noise sharing across frames to enhance temporal stability. Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization. Across diverse in-the-wild videos, our method improves robustness and perceptual fidelity and reduces failure modes compared with the baselines.

ShapeGen4D：迈向基于视频的高质量4D形状生成

ShapeGen4D: Towards High Quality 4D Shape Generation from Videos

摘要

Support