光-X:具备相机与光照控制的生成式4D视频渲染技术
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
December 4, 2025
作者: Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei Liu
cs.AI
摘要
尽管基于图像的照明控制技术近期已延伸至视频领域,但在光照保真度与时间一致性之间仍面临权衡。要实现真实场景的生成式建模,仅靠重照明技术远远不够,关键在于实现对摄像机轨迹与照明的联合控制——因为视觉动态本质上由几何结构与光照共同塑造。为此,我们提出Light-X视频生成框架,支持通过单目视频实现视角与照明的可控渲染。1)我们采用解耦式设计分离几何与光照信号:通过沿用户定义摄像机轨迹投影的动态点云捕捉几何运动,而照明线索则由经重照明的帧在相同几何结构上持续投影提供。这种显式细粒度线索既能实现有效解耦,又可指导高质量光照生成。2)针对缺乏配对多视角-多光照视频的难题,我们开发了Light-Syn合成流程,通过逆向映射的降级处理方法从野外单目视频生成训练数据对。该策略构建的数据集涵盖静态、动态及AI生成场景,确保模型训练的鲁棒性。大量实验表明,Light-X在联合摄像机-照明控制任务上优于基线方法,并在文本/背景条件设置下超越了现有视频重照明技术。
English
Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective disentanglement and guide high-quality illumination. 2) To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping that synthesizes training pairs from in-the-wild monocular footage. This strategy yields a dataset covering static, dynamic, and AI-generated scenes, ensuring robust training. Extensive experiments show that Light-X outperforms baseline methods in joint camera-illumination control and surpasses prior video relighting methods under both text- and background-conditioned settings.