ChatPaper.aiChatPaper

时空领航者:跨时空动态场景的生成式渲染

SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

December 31, 2025
作者: Zhening Huang, Hyeonho Jeong, Xuelin Chen, Yulia Gryaditskaya, Tuanfeng Y. Wang, Joan Lasenby, Chun-Hao Huang
cs.AI

摘要

我们提出SpaceTimePilot——一种通过解耦时空维度实现可控生成式渲染的视频扩散模型。给定单目视频输入,该模型可在生成过程中独立调整摄像机视角与运动序列,实现跨时空连续自由探索的场景重渲染。为实现这一目标,我们在扩散过程中引入了高效的动画时间嵌入机制,从而显式控制输出视频相对于源视频的运动序列。鉴于现有数据集缺乏同一动态场景下具有连续时间变化的配对视频,我们提出了一种简单而有效的时间扭曲训练方案,通过重构现有多视角数据集来模拟时间差异。该策略有效监督模型学习时间控制,实现鲁棒的时空解耦。为提升双控精度,我们进一步引入两项创新:改进的摄像机条件机制支持从首帧开始调整视角,以及首个时空全覆盖的合成渲染数据集CamxTime,该数据集提供场景内完全自由的时空视频轨迹。结合时间扭曲方案与CamxTime数据集的联合训练实现了更精准的时间控制。我们在真实场景与合成数据上评估SpaceTimePilot,结果表明相较于现有方法,该模型展现出清晰的时空解耦特性与卓越的生成效果。项目页面:https://zheninghuang.github.io/Space-Time-Pilot/ 代码库:https://github.com/ZheningHuang/spacetimepilot
English
We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the camera viewpoint and the motion sequence within the generative process, re-rendering the scene for continuous and arbitrary exploration across space and time. To achieve this, we introduce an effective animation time-embedding mechanism in the diffusion process, allowing explicit control of the output video's motion sequence with respect to that of the source video. As no datasets provide paired videos of the same dynamic scene with continuous temporal variations, we propose a simple yet effective temporal-warping training scheme that repurposes existing multi-view datasets to mimic temporal differences. This strategy effectively supervises the model to learn temporal control and achieve robust space-time disentanglement. To further enhance the precision of dual control, we introduce two additional components: an improved camera-conditioning mechanism that allows altering the camera from the first frame, and CamxTime, the first synthetic space-and-time full-coverage rendering dataset that provides fully free space-time video trajectories within a scene. Joint training on the temporal-warping scheme and the CamxTime dataset yields more precise temporal control. We evaluate SpaceTimePilot on both real-world and synthetic data, demonstrating clear space-time disentanglement and strong results compared to prior work. Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot
PDF50January 2, 2026