ChatPaper.aiChatPaper

ShapeGen4D:迈向基于视频的高质量4D形状生成

ShapeGen4D: Towards High Quality 4D Shape Generation from Videos

October 7, 2025
作者: Jiraphon Yenphraphai, Ashkan Mirzaei, Jianqi Chen, Jiaxu Zou, Sergey Tulyakov, Raymond A. Yeh, Peter Wonka, Chaoyang Wang
cs.AI

摘要

视频条件下的4D形状生成旨在直接从输入视频中恢复随时间变化的3D几何结构和视角一致的外观。在本研究中,我们提出了一种原生的视频到4D形状生成框架,该框架能够端到端地从视频中合成单一的动态3D表示。我们的框架基于大规模预训练的3D模型,引入了三个关键组件:(i) 一种时间注意力机制,它在生成过程中考虑所有帧,同时产生时间索引的动态表示;(ii) 一种时间感知的点采样和4D潜在锚定方法,以促进时间上一致的几何和纹理;(iii) 跨帧的噪声共享,以增强时间稳定性。我们的方法无需逐帧优化,便能准确捕捉非刚性运动、体积变化乃至拓扑转变。在多样化的真实世界视频中,与基线方法相比,我们的方法提升了鲁棒性和感知保真度,并减少了失败模式。
English
Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our framework introduces three key components based on large-scale pre-trained 3D models: (i) a temporal attention that conditions generation on all frames while producing a time-indexed dynamic representation; (ii) a time-aware point sampling and 4D latent anchoring that promote temporally consistent geometry and texture; and (iii) noise sharing across frames to enhance temporal stability. Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization. Across diverse in-the-wild videos, our method improves robustness and perceptual fidelity and reduces failure modes compared with the baselines.
PDF132October 8, 2025