Lumen:基於視頻生成模型的一致視頻重打光與和諧背景替換
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models
August 18, 2025
作者: Jianshu Zeng, Yuxuan Liu, Yutong Feng, Chenxuan Miao, Zixiang Gao, Jiwang Qu, Jianzhang Zhang, Bin Wang, Kun Yuan
cs.AI
摘要
影片重照明是一項具有挑戰性且極具價值的任務,其目標在於替換影片背景的同時,相應地調整前景的照明,實現和諧的融合。在轉換過程中,保持前景的原始屬性(如反照率)並在時間幀間傳播一致的重照明效果至關重要。本文提出Lumen,一個基於大規模影片生成模型開發的端到端影片重照明框架,它接收靈活的文本描述來指導照明和背景的控制。考慮到高質量、同一前景在不同光照條件下的配對影片稀缺,我們構建了一個包含真實與合成影片的大規模數據集。在合成領域,得益於社區中豐富的3D資源,我們利用先進的3D渲染引擎在多樣化環境中策劃影片對。在真實領域,我們採用基於HDR的照明模擬來補充野外配對影片的不足。依托於上述數據集,我們設計了一種聯合訓練課程,有效釋放各領域的優勢,即合成影片中的物理一致性與真實影片中的廣域分佈。為實現這一點,我們在模型中注入了一個領域感知適配器,以解耦重照明與領域外觀分佈的學習。我們建立了一個全面的基準來評估Lumen及現有方法,從前景保持和影片一致性評估的角度出發。實驗結果表明,Lumen能有效地將輸入編輯成具有一致照明和嚴格前景保持的電影級重照明影片。我們的項目頁面:https://lumen-relight.github.io/
English
Video relighting is a challenging yet valuable task, aiming to replace the
background in videos while correspondingly adjusting the lighting in the
foreground with harmonious blending. During translation, it is essential to
preserve the original properties of the foreground, e.g., albedo, and propagate
consistent relighting among temporal frames. In this paper, we propose Lumen,
an end-to-end video relighting framework developed on large-scale video
generative models, receiving flexible textual description for instructing the
control of lighting and background. Considering the scarcity of high-qualified
paired videos with the same foreground in various lighting conditions, we
construct a large-scale dataset with a mixture of realistic and synthetic
videos. For the synthetic domain, benefiting from the abundant 3D assets in the
community, we leverage advanced 3D rendering engine to curate video pairs in
diverse environments. For the realistic domain, we adapt a HDR-based lighting
simulation to complement the lack of paired in-the-wild videos. Powered by the
aforementioned dataset, we design a joint training curriculum to effectively
unleash the strengths of each domain, i.e., the physical consistency in
synthetic videos, and the generalized domain distribution in realistic videos.
To implement this, we inject a domain-aware adapter into the model to decouple
the learning of relighting and domain appearance distribution. We construct a
comprehensive benchmark to evaluate Lumen together with existing methods, from
the perspectives of foreground preservation and video consistency assessment.
Experimental results demonstrate that Lumen effectively edit the input into
cinematic relighted videos with consistent lighting and strict foreground
preservation. Our project page: https://lumen-relight.github.io/