学习潜在代理以实现可控单图像重照明
Learning Latent Proxies for Controllable Single-Image Relighting
March 16, 2026
作者: Haoze Zheng, Zihao Wang, Xianfeng Wu, Yajing Bai, Yexin Liu, Yun Li, Xiaogang Xu, Harry Yang
cs.AI
摘要
单图像重照明具有高度欠约束性:微小的光照变化可能引发阴影、高光和反射区域的剧烈非线性变化,而几何结构与材质属性却无法直接观测。现有基于扩散模型的方法要么依赖需要密集且脆弱监督的本征分解或G缓冲管线,要么仅在潜空间运行而缺乏物理基础,导致对光照方向、强度和色彩的细粒度控制不可靠。我们发现精确重照明并不需要完整的本征分解,而是仅需稀疏但具有物理意义的提示线索——即指示光照变化区域及材质响应方式的信号——便足以引导扩散模型。基于此洞见,我们提出LightCtrl框架,通过双层物理先验进行融合:其一是基于小样本学习的潜空间代理编码器,从有限PBR监督中提取紧凑的材质-几何线索;其二是光照感知掩码,用于识别光照敏感区域并引导去噪器关注与着色相关的像素。为弥补PBR数据稀缺性,我们采用基于DPO的目标函数对代理分支进行优化,确保预测线索的物理一致性。同时我们构建了ScaLight数据集——一个包含系统化光照变化及完整相机-光源元数据的大规模物体级数据集,为物理一致的可控训练提供支持。在物体与场景级基准测试中,本方法实现了光度测量级精准的重照明效果,具备精确的连续控制能力,显著超越现有扩散模型及基于本征分解的基线方法,在受控光照变换下PSNR指标最高提升+2.4 dB,RMSE降低达35%。
English
Single-image relighting is highly under-constrained: small illumination changes can produce large, nonlinear variations in shading, shadows, and specularities, while geometry and materials remain unobserved. Existing diffusion-based approaches either rely on intrinsic or G-buffer pipelines that require dense and fragile supervision, or operate purely in latent space without physical grounding, making fine-grained control of direction, intensity, and color unreliable. We observe that a full intrinsic decomposition is unnecessary and redundant for accurate relighting. Instead, sparse but physically meaningful cues, indicating where illumination should change and how materials should respond, are sufficient to guide a diffusion model. Based on this insight, we introduce LightCtrl that integrates physical priors at two levels: a few-shot latent proxy encoder that extracts compact material-geometry cues from limited PBR supervision, and a lighting-aware mask that identifies sensitive illumination regions and steers the denoiser toward shading relevant pixels. To compensate for scarce PBR data, we refine the proxy branch using a DPO-based objective that enforces physical consistency in the predicted cues. We also present ScaLight, a large-scale object-level dataset with systematically varied illumination and complete camera-light metadata, enabling physically consistent and controllable training. Across object and scene level benchmarks, our method achieves photometrically faithful relighting with accurate continuous control, surpassing prior diffusion and intrinsic-based baselines, including gains of up to +2.4 dB PSNR and 35% lower RMSE under controlled lighting shifts.