Vista4D：基于四维点云的视频重摄技术

摘要

我们提出Vista4D——一种基于4D点云实现输入视频与目标相机标定的鲁棒性视频重摄框架。该技术通过将动态场景锚定在四维时空点云中，能够从不同相机轨迹与视角重新合成具有相同动态特性的场景。现有视频重摄方法常受限于真实世界动态视频的深度估计伪影，难以保持内容外观一致性，且无法对复杂新轨迹实现精确相机控制。我们通过静态像素分割与四维重建构建 grounded 4D点云表征，显式保留已观测内容并提供丰富相机信号，同时利用重建的多视角动态数据训练模型，使系统在真实场景推理时能有效抵抗点云伪影。实验表明，相较于现有先进基线方法，我们的方案在多种视频与相机路径下均展现出更优的四维一致性、相机控制精度和视觉质量。此外，该方法可泛化应用于动态场景扩展、四维场景重组等现实任务。相关成果、代码与模型详见项目页面：https://eyeline-labs.github.io/Vista4D

English

We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video reshooting methods often struggle with depth estimation artifacts of real-world dynamic videos, while also failing to preserve content appearance and failing to maintain precise camera control for challenging new trajectories. We build a 4D-grounded point cloud representation with static pixel segmentation and 4D reconstruction to explicitly preserve seen content and provide rich camera signals, and we train with reconstructed multiview dynamic data for robustness against point cloud artifacts during real-world inference. Our results demonstrate improved 4D consistency, camera control, and visual quality compared to state-of-the-art baselines under a variety of videos and camera paths. Moreover, our method generalizes to real-world applications such as dynamic scene expansion and 4D scene recomposition. See our project page for results, code, and models: https://eyeline-labs.github.io/Vista4D