ChatPaper.aiChatPaper

Lumen:基于视频生成模型的一致视频重照明与和谐背景替换

Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

August 18, 2025
作者: Jianshu Zeng, Yuxuan Liu, Yutong Feng, Chenxuan Miao, Zixiang Gao, Jiwang Qu, Jianzhang Zhang, Bin Wang, Kun Yuan
cs.AI

摘要

视频重光照是一项具有挑战性且极具价值的任务,旨在替换视频背景的同时,相应地调整前景的光照,实现和谐融合。在转换过程中,必须保持前景的原始属性,如反照率,并确保时间帧间光照调整的一致性。本文提出Lumen,一个基于大规模视频生成模型的端到端视频重光照框架,能够接收灵活的文本描述来指导光照和背景的控制。考虑到高质量、相同前景在不同光照条件下的配对视频稀缺,我们构建了一个包含真实与合成视频的大规模数据集。在合成领域,得益于社区丰富的3D资源,我们利用先进的3D渲染引擎制作了多样环境下的视频对。在真实领域,我们采用基于HDR的光照模拟技术,以弥补野外配对视频的不足。依托上述数据集,我们设计了一套联合训练课程,有效发挥各领域的优势,即合成视频中的物理一致性,以及真实视频中的广义领域分布。为此,我们在模型中引入了一个领域感知适配器,以解耦重光照与领域外观分布的学习。我们构建了一个全面的基准测试,从前景保持和视频一致性评估的角度,对Lumen及现有方法进行评价。实验结果表明,Lumen能够有效地将输入编辑为具有一致光照和严格前景保持的电影级重光照视频。我们的项目页面:https://lumen-relight.github.io/
English
Video relighting is a challenging yet valuable task, aiming to replace the background in videos while correspondingly adjusting the lighting in the foreground with harmonious blending. During translation, it is essential to preserve the original properties of the foreground, e.g., albedo, and propagate consistent relighting among temporal frames. In this paper, we propose Lumen, an end-to-end video relighting framework developed on large-scale video generative models, receiving flexible textual description for instructing the control of lighting and background. Considering the scarcity of high-qualified paired videos with the same foreground in various lighting conditions, we construct a large-scale dataset with a mixture of realistic and synthetic videos. For the synthetic domain, benefiting from the abundant 3D assets in the community, we leverage advanced 3D rendering engine to curate video pairs in diverse environments. For the realistic domain, we adapt a HDR-based lighting simulation to complement the lack of paired in-the-wild videos. Powered by the aforementioned dataset, we design a joint training curriculum to effectively unleash the strengths of each domain, i.e., the physical consistency in synthetic videos, and the generalized domain distribution in realistic videos. To implement this, we inject a domain-aware adapter into the model to decouple the learning of relighting and domain appearance distribution. We construct a comprehensive benchmark to evaluate Lumen together with existing methods, from the perspectives of foreground preservation and video consistency assessment. Experimental results demonstrate that Lumen effectively edit the input into cinematic relighted videos with consistent lighting and strict foreground preservation. Our project page: https://lumen-relight.github.io/
PDF143August 19, 2025