全面重光照：可泛化且一致的單目人體重光照與和諧化

摘要

本文介紹了全面重光照技術，這是首個能夠從任意場景中的人體圖像或視頻中控制並協調光照的一體化方法。由於數據集的缺乏，構建這樣一個通用模型極具挑戰性，這使得現有的基於圖像的重光照模型僅限於特定場景（例如，面部或靜態人體）。為應對這一挑戰，我們重新利用預訓練的擴散模型作為通用圖像先驗，並在從粗到精的框架中聯合建模人體重光照與背景協調。為了進一步增強重光照的時間一致性，我們引入了一種無監督的時間光照模型，該模型從大量真實世界視頻中學習光照週期一致性，而無需任何地面真值。在推理階段，我們通過時空特徵融合算法將時間光照模塊與擴散模型結合，無需額外訓練；並應用一種新的引導細化作為後處理，以保留輸入圖像中的高頻細節。實驗結果顯示，全面重光照技術展現出強大的通用性和光照時間一致性，優於現有的基於圖像的人體重光照與協調方法。

English

This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model is extremely challenging due to the lack of dataset, restricting existing image-based relighting models to a specific scenario (e.g., face or static human). To address this challenge, we repurpose a pre-trained diffusion model as a general image prior and jointly model the human relighting and background harmonization in the coarse-to-fine framework. To further enhance the temporal coherence of the relighting, we introduce an unsupervised temporal lighting model that learns the lighting cycle consistency from many real-world videos without any ground truth. In inference time, our temporal lighting module is combined with the diffusion models through the spatio-temporal feature blending algorithms without extra training; and we apply a new guided refinement as a post-processing to preserve the high-frequency details from the input image. In the experiments, Comprehensive Relighting shows a strong generalizability and lighting temporal coherence, outperforming existing image-based human relighting and harmonization methods.

全面重光照：可泛化且一致的單目人體重光照與和諧化

Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization

摘要

Support