潜在プロキシを用いた制御可能な単一画像再照明

要旨

単一画像の再照明は非常に制約が少ない問題である。照明のわずかな変化が、陰影、影、鏡面反射に非線形な大きな変動をもたらす一方で、幾何学形状や材質は観測不能なままである。既存の拡散モデルベースの手法は、高密度で脆弱な教師信号を必要とするインテリンシック分解やGバッファーパイプラインに依存するか、あるいは物理的根拠を持たない潜在空間のみで動作するため、方向、強度、色の微細な制御が信頼できない。我々は、正確な再照明には完全なインテリンシック分解が不必要かつ冗長であると考察する。代わりに、照明が変化すべき場所と材質がどう反応すべきかを示す、疎ではあるが物理的に意味のある手がかりが、拡散モデルを導くには十分である。この知見に基づき、我々はLightCtrlを提案する。これは物理的な事前知識を2段階で統合する：限られたPBR教師データからコンパクトな材質・幾何学手がかりを抽出する少数ショット潜在代理エンコーダと、照明変化に敏感な領域を特定し、デノイザーを陰影関連ピクセルに向けて誘導する照明認識マスクである。PBRデータの不足を補うため、予測された手がかりの物理的一貫性を強化するDPOベースの目的関数を用いて代理分岐を洗練する。さらに、体系的に変化する照明と完全なカメラ・照明メタデータを備えた大規模オブジェクトレベルデータセットであるScaLightを提示し、物理的一貫性と制御性を備えた学習を可能にする。オブジェクトレベル及びシーンレベルのベンチマークにおいて、本手法は正確な連続制御を伴う測光的に忠実な再照明を実現し、従来の拡散モデル及びインテリンシックベースのベースラインを凌駕し、制御された照明変化下で最大+2.4 dBのPSNR向上と35%低いRMSEを示す。

English

Single-image relighting is highly under-constrained: small illumination changes can produce large, nonlinear variations in shading, shadows, and specularities, while geometry and materials remain unobserved. Existing diffusion-based approaches either rely on intrinsic or G-buffer pipelines that require dense and fragile supervision, or operate purely in latent space without physical grounding, making fine-grained control of direction, intensity, and color unreliable. We observe that a full intrinsic decomposition is unnecessary and redundant for accurate relighting. Instead, sparse but physically meaningful cues, indicating where illumination should change and how materials should respond, are sufficient to guide a diffusion model. Based on this insight, we introduce LightCtrl that integrates physical priors at two levels: a few-shot latent proxy encoder that extracts compact material-geometry cues from limited PBR supervision, and a lighting-aware mask that identifies sensitive illumination regions and steers the denoiser toward shading relevant pixels. To compensate for scarce PBR data, we refine the proxy branch using a DPO-based objective that enforces physical consistency in the predicted cues. We also present ScaLight, a large-scale object-level dataset with systematically varied illumination and complete camera-light metadata, enabling physically consistent and controllable training. Across object and scene level benchmarks, our method achieves photometrically faithful relighting with accurate continuous control, surpassing prior diffusion and intrinsic-based baselines, including gains of up to +2.4 dB PSNR and 35% lower RMSE under controlled lighting shifts.

潜在プロキシを用いた制御可能な単一画像再照明

Learning Latent Proxies for Controllable Single-Image Relighting

要旨

Support