UniLumos:基於物理可信反饋的快速統一圖像與視頻重照明技術
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
November 3, 2025
作者: Ropeway Liu, Hangjie Yuan, Bo Dong, Jiazheng Xing, Jinwang Wang, Rui Zhao, Yan Xing, Weihua Chen, Fan Wang
cs.AI
摘要
重光照技術是一項兼具實用需求與藝術價值的關鍵任務,近期擴散模型通過實現豐富可控的照明效果展現出強大潛力。然而,由於這類模型通常在語義潛空間進行優化,而潛空間中的鄰近性無法保證視覺空間的物理正確性,因此常產生不真實的結果,例如過曝的高光、錯位的陰影與錯誤的遮擋。為此我們提出UniLumos——一個適用於圖像與視頻的統一重光照框架,將RGB空間的幾何反饋引入流匹配主幹網絡。通過從模型輸出中提取深度圖和法線圖進行監督,我們顯式地將照明效果與場景結構對齊,從而提升物理合理性。但這種反饋機制需要高質量輸出作為視覺空間的監督信號,導致傳統多步去噪方法計算成本高昂。為緩解此問題,我們採用路徑一致性學習,使監督在少步數訓練機制下仍保持有效性。為實現細粒度重光照控制與監督,我們設計了結構化的六維標註協議,用於捕捉核心光照屬性。基於此協議,我們提出LumosBench解耦屬性級評測基準,通過大型視覺語言模型評估光照可控性,實現跨維度的自動化可解釋重光照精度評估。大量實驗表明,UniLumos在實現顯著提升物理一致性的同時,達到了業界頂尖的重光照質量,並為圖像和視頻重光照帶來20倍加速效果。代碼已開源於:https://github.com/alibaba-damo-academy/Lumos-Custom。
English
Relighting is a crucial task with both practical demand and artistic value,
and recent diffusion models have shown strong potential by enabling rich and
controllable lighting effects. However, as they are typically optimized in
semantic latent space, where proximity does not guarantee physical correctness
in visual space, they often produce unrealistic results, such as overexposed
highlights, misaligned shadows, and incorrect occlusions. We address this with
UniLumos, a unified relighting framework for both images and videos that brings
RGB-space geometry feedback into a flow matching backbone. By supervising the
model with depth and normal maps extracted from its outputs, we explicitly
align lighting effects with the scene structure, enhancing physical
plausibility. Nevertheless, this feedback requires high-quality outputs for
supervision in visual space, making standard multi-step denoising
computationally expensive. To mitigate this, we employ path consistency
learning, allowing supervision to remain effective even under few-step training
regimes. To enable fine-grained relighting control and supervision, we design a
structured six-dimensional annotation protocol capturing core illumination
attributes. Building upon this, we propose LumosBench, a disentangled
attribute-level benchmark that evaluates lighting controllability via large
vision-language models, enabling automatic and interpretable assessment of
relighting precision across individual dimensions. Extensive experiments
demonstrate that UniLumos achieves state-of-the-art relighting quality with
significantly improved physical consistency, while delivering a 20x speedup for
both image and video relighting. Code is available at
https://github.com/alibaba-damo-academy/Lumos-Custom.