包括的リライティング：一般化可能で一貫性のある単眼人間リライティングと調和化

要旨

本論文では、Comprehensive Relightingを紹介します。これは、任意のシーンからの人体の画像やビデオにおいて、照明を制御し調和させる初のオールインワンアプローチです。このような汎用性の高いモデルを構築することは、データセットの不足により非常に困難であり、既存の画像ベースのリライティングモデルは特定のシナリオ（例：顔や静止した人間）に限定されていました。この課題に対処するため、我々は事前学習済みの拡散モデルを汎用画像事前分布として再利用し、粗から細へのフレームワークで人間のリライティングと背景の調和を共同でモデル化します。さらに、リライティングの時間的整合性を向上させるために、教師なしの時間的照明モデルを導入し、多くの実世界のビデオから照明のサイクル一貫性を学習します。推論時には、この時間的照明モジュールが拡散モデルと時空間的特徴ブレンディングアルゴリズムを通じて組み合わされ、追加の学習を必要としません。また、入力画像の高周波詳細を保持するために、新しいガイド付きリファインメントを後処理として適用します。実験では、Comprehensive Relightingは強力な汎用性と照明の時間的整合性を示し、既存の画像ベースの人間リライティングおよび調和手法を凌駕しています。

English

This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model is extremely challenging due to the lack of dataset, restricting existing image-based relighting models to a specific scenario (e.g., face or static human). To address this challenge, we repurpose a pre-trained diffusion model as a general image prior and jointly model the human relighting and background harmonization in the coarse-to-fine framework. To further enhance the temporal coherence of the relighting, we introduce an unsupervised temporal lighting model that learns the lighting cycle consistency from many real-world videos without any ground truth. In inference time, our temporal lighting module is combined with the diffusion models through the spatio-temporal feature blending algorithms without extra training; and we apply a new guided refinement as a post-processing to preserve the high-frequency details from the input image. In the experiments, Comprehensive Relighting shows a strong generalizability and lighting temporal coherence, outperforming existing image-based human relighting and harmonization methods.

包括的リライティング：一般化可能で一貫性のある単眼人間リライティングと調和化

Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization

要旨

Support