잠재 프록시를 활용한 제어 가능한 단일 이미지 재조명

초록

단일 이미지 재조명은 매우 제약이 적은 문제로, 작은 조명 변화에도 그림자, 음영, 반사가 비선형적으로 크게 변하는 반면 기하 구조와 재질은 관측되지 않습니다. 기존 확산 모델 기반 접근법은 내재적 분해나 G-버퍼 파이프라인에 의존해 조밀하고 취약한 지도 학습이 필요하거나, 물리적 근거 없이 순수 잠재 공간에서 작동하여 방향, 강도, 색상에 대한 세밀한 제어가 불안정합니다. 우리는 정확한 재조명을 위해 완전한 내재적 분해가 불필요하고 중복됨을 확인했습니다. 대신 조명이 변화해야 할 위치와 재질이 반응해야 할 방식을 나타내는 희소하지만 물리적으로 의미 있는 단서들만으로도 확산 모델을 안내하는 데 충분합니다. 이러한 통찰을 바탕으로 우리는 두 수준에서 물리적 사전 지식을 통합하는 LightCtrl을 제안합니다: 제한된 PBR 지도 학습으로부터 간결한 재질-기하 구조 단서를 추출하는 소수 샷 잠재 프록시 인코더와, 조명에 민감한 영역을 식별하고 디노이저가 음영 관련 픽셀에 집중하도록 유도하는 조명 인식 마스크입니다. 부족한 PBR 데이터를 보완하기 위해 DPO 기반 목적 함수로 프록시 분기를 개선하여 예측 단서의 물리적 일관성을 강화합니다. 또한 체계적으로 변화하는 조명과 완전한 카메라-라이트 메타데이터를 갖춘 대규모 객체 수준 데이터셋인 ScaLight을 제시하여 물리적으로 일관되고 제어 가능한 학습을 가능하게 합니다. 객체 및 장면 수준 벤치마크에서 우리 방법은 정확한 연속 제어가 가능한 광도 측면에서 충실한 재조명 결과를 달성하며, 기존 확산 및 내재적 기반 방법들을 능가합니다. 특히 제어된 조명 변화에서 최대 +2.4dB PSNR 향상과 35% 낮은 RMSE를 보입니다.

English

Single-image relighting is highly under-constrained: small illumination changes can produce large, nonlinear variations in shading, shadows, and specularities, while geometry and materials remain unobserved. Existing diffusion-based approaches either rely on intrinsic or G-buffer pipelines that require dense and fragile supervision, or operate purely in latent space without physical grounding, making fine-grained control of direction, intensity, and color unreliable. We observe that a full intrinsic decomposition is unnecessary and redundant for accurate relighting. Instead, sparse but physically meaningful cues, indicating where illumination should change and how materials should respond, are sufficient to guide a diffusion model. Based on this insight, we introduce LightCtrl that integrates physical priors at two levels: a few-shot latent proxy encoder that extracts compact material-geometry cues from limited PBR supervision, and a lighting-aware mask that identifies sensitive illumination regions and steers the denoiser toward shading relevant pixels. To compensate for scarce PBR data, we refine the proxy branch using a DPO-based objective that enforces physical consistency in the predicted cues. We also present ScaLight, a large-scale object-level dataset with systematically varied illumination and complete camera-light metadata, enabling physically consistent and controllable training. Across object and scene level benchmarks, our method achieves photometrically faithful relighting with accurate continuous control, surpassing prior diffusion and intrinsic-based baselines, including gains of up to +2.4 dB PSNR and 35% lower RMSE under controlled lighting shifts.

잠재 프록시를 활용한 제어 가능한 단일 이미지 재조명

Learning Latent Proxies for Controllable Single-Image Relighting

초록

Support