Lumen: ビデオ生成モデルを用いた一貫性のある映像の再照明と調和的な背景置換

要旨

ビデオリライティングは、背景を置き換えながら前景の照明を調和のとれた形で調整するという、困難ながらも価値のあるタスクです。翻訳においては、アルベドなどの前景の元の特性を保持し、時間的なフレーム間で一貫したリライティングを伝播させることが重要です。本論文では、大規模なビデオ生成モデルを基に開発されたエンドツーエンドのビデオリライティングフレームワークであるLumenを提案します。Lumenは、照明と背景の制御を指示するための柔軟なテキスト記述を受け取ります。様々な照明条件下で同じ前景を持つ高品質なペアビデオの不足を考慮し、現実的および合成的なビデオを混合した大規模なデータセットを構築しました。合成的な領域では、コミュニティ内の豊富な3Dアセットを活用し、先進的な3Dレンダリングエンジンを使用して多様な環境でのビデオペアをキュレーションしました。現実的な領域では、HDRベースの照明シミュレーションを適応させ、野外でのペアビデオの不足を補完しました。前述のデータセットを活用し、合成的ビデオの物理的一貫性と現実的ビデオの一般化された領域分布という各領域の強みを効果的に引き出すための共同トレーニングカリキュラムを設計しました。これを実現するため、モデルに領域認識アダプターを注入し、リライティングと領域外観分布の学習を分離しました。前景の保存とビデオの一貫性評価の観点から、Lumenと既存の手法を評価するための包括的なベンチマークを構築しました。実験結果は、Lumenが入力ビデオを一貫した照明と厳密な前景保存を伴う映画的なリライティングビデオに効果的に編集することを示しています。プロジェクトページ: https://lumen-relight.github.io/

English

Video relighting is a challenging yet valuable task, aiming to replace the background in videos while correspondingly adjusting the lighting in the foreground with harmonious blending. During translation, it is essential to preserve the original properties of the foreground, e.g., albedo, and propagate consistent relighting among temporal frames. In this paper, we propose Lumen, an end-to-end video relighting framework developed on large-scale video generative models, receiving flexible textual description for instructing the control of lighting and background. Considering the scarcity of high-qualified paired videos with the same foreground in various lighting conditions, we construct a large-scale dataset with a mixture of realistic and synthetic videos. For the synthetic domain, benefiting from the abundant 3D assets in the community, we leverage advanced 3D rendering engine to curate video pairs in diverse environments. For the realistic domain, we adapt a HDR-based lighting simulation to complement the lack of paired in-the-wild videos. Powered by the aforementioned dataset, we design a joint training curriculum to effectively unleash the strengths of each domain, i.e., the physical consistency in synthetic videos, and the generalized domain distribution in realistic videos. To implement this, we inject a domain-aware adapter into the model to decouple the learning of relighting and domain appearance distribution. We construct a comprehensive benchmark to evaluate Lumen together with existing methods, from the perspectives of foreground preservation and video consistency assessment. Experimental results demonstrate that Lumen effectively edit the input into cinematic relighted videos with consistent lighting and strict foreground preservation. Our project page: https://lumen-relight.github.io/

Lumen: ビデオ生成モデルを用いた一貫性のある映像の再照明と調和的な背景置換

Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

要旨

Support