UniLumos:基于物理可信反馈的快速统一式图像与视频重照明
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
November 3, 2025
作者: Ropeway Liu, Hangjie Yuan, Bo Dong, Jiazheng Xing, Jinwang Wang, Rui Zhao, Yan Xing, Weihua Chen, Fan Wang
cs.AI
摘要
重光照是一项兼具实用需求与艺术价值的关键任务,而近期扩散模型通过实现丰富可控的照明效果展现出强大潜力。然而,由于这类模型通常在语义隐空间中进行优化,其邻近性无法保证视觉空间中的物理正确性,因此常产生不真实的结果,如过曝高光、错位阴影和错误遮挡。我们提出UniLumos来解决这一问题——这是一个面向图像与视频的统一重光照框架,将RGB空间的几何反馈引入流匹配主干网络。通过使用从模型输出中提取的深度图和法线图进行监督,我们显式地将光照效果与场景结构对齐,从而增强物理合理性。但此类反馈需要高质量输出在视觉空间中进行监督,使得标准的多步去噪方法计算成本高昂。为缓解这一问题,我们采用路径一致性学习,使监督在少步数训练机制下仍能保持有效性。为实现细粒度重光照控制与监督,我们设计了结构化六维标注协议以捕捉核心光照属性。基于此,我们提出LumosBench——一个解耦的属性级基准测试,通过大视觉语言模型评估光照可控性,实现对各个维度重光照精度的自动化可解释评估。大量实验表明,UniLumos在实现最先进重光照质量的同时显著提升物理一致性,并为图像和视频重光照带来20倍加速。代码已开源于https://github.com/alibaba-damo-academy/Lumos-Custom。
English
Relighting is a crucial task with both practical demand and artistic value,
and recent diffusion models have shown strong potential by enabling rich and
controllable lighting effects. However, as they are typically optimized in
semantic latent space, where proximity does not guarantee physical correctness
in visual space, they often produce unrealistic results, such as overexposed
highlights, misaligned shadows, and incorrect occlusions. We address this with
UniLumos, a unified relighting framework for both images and videos that brings
RGB-space geometry feedback into a flow matching backbone. By supervising the
model with depth and normal maps extracted from its outputs, we explicitly
align lighting effects with the scene structure, enhancing physical
plausibility. Nevertheless, this feedback requires high-quality outputs for
supervision in visual space, making standard multi-step denoising
computationally expensive. To mitigate this, we employ path consistency
learning, allowing supervision to remain effective even under few-step training
regimes. To enable fine-grained relighting control and supervision, we design a
structured six-dimensional annotation protocol capturing core illumination
attributes. Building upon this, we propose LumosBench, a disentangled
attribute-level benchmark that evaluates lighting controllability via large
vision-language models, enabling automatic and interpretable assessment of
relighting precision across individual dimensions. Extensive experiments
demonstrate that UniLumos achieves state-of-the-art relighting quality with
significantly improved physical consistency, while delivering a 20x speedup for
both image and video relighting. Code is available at
https://github.com/alibaba-damo-academy/Lumos-Custom.