时空观测者:基于超空间扩散采样的四维世界驯服术
ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling
December 1, 2025
作者: Qisen Wang, Yifan Zhao, Peisen Shen, Jialu Li, Jia Li
cs.AI
摘要
尽管当前主流的相机控制视频生成模型能够制作出电影级效果,但将其直接应用于生成具有三维一致性与高保真度的时间同步多视角视频仍面临挑战,而这正是驾驭四维世界的关键能力。现有研究或采用数据增强策略,或依赖测试时优化技术,但这些方法受限于模型泛化能力不足与可扩展性问题。为此,我们提出ChronosObserver——一种免训练方法,其核心包含用于表征四维世界场景时空约束的「世界状态超空间」,以及利用该超空间实现多视角扩散采样轨迹同步的「超空间引导采样」。实验结果表明,本方法无需对扩散模型进行训练或微调,即可生成高保真、三维一致的时间同步多视角视频。
English
Although prevailing camera-controlled video generation models can produce cinematic results, lifting them directly to the generation of 3D-consistent and high-fidelity time-synchronized multi-view videos remains challenging, which is a pivotal capability for taming 4D worlds. Some works resort to data augmentation or test-time optimization, but these strategies are constrained by limited model generalization and scalability issues. To this end, we propose ChronosObserver, a training-free method including World State Hyperspace to represent the spatiotemporal constraints of a 4D world scene, and Hyperspace Guided Sampling to synchronize the diffusion sampling trajectories of multiple views using the hyperspace. Experimental results demonstrate that our method achieves high-fidelity and 3D-consistent time-synchronized multi-view videos generation without training or fine-tuning for diffusion models.