时空领航者:动态场景的跨时空生成式渲染
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
December 31, 2025
作者: Zhening Huang, Hyeonho Jeong, Xuelin Chen, Yulia Gryaditskaya, Tuanfeng Y. Wang, Joan Lasenby, Chun-Hao Huang
cs.AI
摘要
我们提出SpaceTimePilot——一种通过解耦时空维度实现可控生成式渲染的视频扩散模型。给定单目视频输入,该模型可在生成过程中独立调整摄像机视点和运动序列,实现跨时空连续自由探索的场景重渲染。为实现这一目标,我们在扩散过程中引入了高效的动画时间嵌入机制,从而实现对输出视频相对于源视频运动序列的显式控制。由于现有数据集无法提供具有连续时序变化的同一动态场景配对视频,我们提出了一种简洁有效的时序扭曲训练方案,通过重构现有多视角数据集来模拟时序差异。该策略有效指导模型学习时序控制,实现稳健的时空解耦。为提升双控精度,我们进一步引入两项创新:改进的摄像机条件机制支持从首帧开始调整视角,以及首个合成时空全覆盖渲染数据集CamxTime——提供场景内完全自由的时空视频轨迹。结合时序扭曲方案与CamxTime数据集的联合训练,使模型获得更精确的时序控制能力。我们在真实场景与合成数据上验证了SpaceTimePilot的性能,相较于现有方法,该模型展现出清晰的时空解耦特性和卓越的生成效果。项目页面:https://zheninghuang.github.io/Space-Time-Pilot/ 代码库:https://github.com/ZheningHuang/spacetimepilot
English
We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the camera viewpoint and the motion sequence within the generative process, re-rendering the scene for continuous and arbitrary exploration across space and time. To achieve this, we introduce an effective animation time-embedding mechanism in the diffusion process, allowing explicit control of the output video's motion sequence with respect to that of the source video. As no datasets provide paired videos of the same dynamic scene with continuous temporal variations, we propose a simple yet effective temporal-warping training scheme that repurposes existing multi-view datasets to mimic temporal differences. This strategy effectively supervises the model to learn temporal control and achieve robust space-time disentanglement. To further enhance the precision of dual control, we introduce two additional components: an improved camera-conditioning mechanism that allows altering the camera from the first frame, and CamxTime, the first synthetic space-and-time full-coverage rendering dataset that provides fully free space-time video trajectories within a scene. Joint training on the temporal-warping scheme and the CamxTime dataset yields more precise temporal control. We evaluate SpaceTimePilot on both real-world and synthetic data, demonstrating clear space-time disentanglement and strong results compared to prior work. Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot