ChatPaper.aiChatPaper

Holo-World:面向视频世界模型的统一相机、物体与天气控制

Holo-World: Unified Camera, Object and Weather Control for Video World Model

June 18, 2026
作者: Xiangchen Yin, Wenzhang Sun, Jiahui Yuan, Zijie Liu, Yinda Chen, Wei Li, Dachun Kai, Chunfeng Wang, Xiaoyan Sun
cs.AI

摘要

视频世界模型正朝着在可控相机和物体运动下保留观测世界、同时允许其环境状态变化的方向发展。然而,这些控制手段仍然相互孤立,且天气生成通常依赖于已明确未来结构的源视频或重建场景。我们研究了一种以第一帧为锚点的源到状态设置:模型从单张图像出发,遵循显式的相机与物体控制指令及可选的天气指令,生成一个保持原始世界或将其转移到目标天气状态的视频。为应对这些挑战,我们首先构建了HoloStateData——一个状态视频数据集,将多样化的视频转化为统一的控制样本,用于相机、物体和天气的监督学习。其次,我们提出Holo-World,一个统一的可控视频世界模型,能够从单张图像联合控制场景。其统一场景适配器将世界保持与天气转移分解为不同的参数子空间,利用渲染背景、几何缓冲和物体控制来维持受控的场景结构,同时建模与天气相关的外观和粒子效果。此外,场景-天气分解式CFG分别引导场景残差和天气残差,既增强了目标天气效果,又避免过度放大整个条件。定量和定性实验表明,Holo-World在将场景转移到多种目标天气状态时,能够保持精确的相机与物体控制及一致的场景结构,在天气状态生成方面优于基于视频到视频的天气编辑基线。我们的项目页面见 https://xiangchenyin.github.io/Holo-World/。
English
Video world models are moving toward preserving an observed world under controllable camera and object motion while allowing its environmental state to change. Yet these controls remain isolated, and weather generation typically relies on a source video or reconstructed scene that already specifies future structure. We study a first-frame-anchored source-to-state setting, where the model starts from a single image and follows explicit camera and object controls and an optional weather instruction, then generates a video that either preserves the source world or transfers it to a target weather state. To address these challenges, we first build HoloStateData, a state video dataset that turns diverse videos into unified control samples for camera, object, and weather supervision. Second, we introduce Holo-World, a unified controllable video world model that jointly controls scene from a single image. Its Unified Scene Adapter factorizes world preservation and weather transfer into distinct parameter subspaces, using rendered background, geometry buffers, and object controls to maintain controlled scene structure while modeling weather-dependent appearance and particle effects. Additionally, Scene-Weather Decomposed CFG guides scene and weather residuals separately, strengthening target weather effects without over-amplifying the full condition. Quantitative and qualitative experiments demonstrate that Holo-World maintains precise camera and object control with consistent scene structure while transferring scenes into diverse target weather state, outperforming video-to-video weather editing baselines on weather-state generation. Our project page is available at https://xiangchenyin.github.io/Holo-World/.