Relit-LiVE: 環境ビデオの共同学習による動画の再照明

要旨

近年の進展により、大規模ビデオ拡散モデルを、まずビデオを内在的なシーン表現に分解し、その後新たな照明下で順方向レンダリングを行うことで、ニューラルレンダラーとして再利用できることが示されている。有望ではあるものの、このパラダイムは正確な内在的分解に根本的に依存しており、実世界のビデオでは依然として信頼性が極めて低く、再照明時に外観の歪み、マテリアルの破損、時間的なアーティファクトの蓄積を引き起こすことが多い。本研究では、カメラポーズの事前知識を必要とせずに、物理的に一貫性があり時間的に安定した結果を生成する、新しいビデオ再照明フレームワークRelit-LiVEを提案する。我々の主要な洞察は、レンダリングプロセスに生の参照画像を明示的に導入し、内在的表現で不可避的に失われたり劣化したりする重要なシーン手がかりをモデルが復元できるようにすることである。さらに、単一の拡散プロセスにおいて、再照明されたビデオと各カメラ視点に整合したフレームごとの環境マップを同時に生成する、新しい環境ビデオ予測定式化を提案する。この共同予測は強力な幾何学・照明アラインメントを強制し、動的な照明とカメラ移動を自然にサポートし、ビデオ再照明における物理的一貫性を大幅に向上させるとともに、既知のフレームごとのカメラポーズの必要性を緩和する。広範な実験により、Relit-LiVEは合成および実世界のベンチマークにおいて、最新のビデオ再照明手法やニューラルレンダリング手法を一貫して上回ることを実証する。再照明以外にも、我々のフレームワークはシーンレベルのレンダリング、マテリアル編集、オブジェクト挿入、ストリーミングビデオ再照明など、幅広い下流アプリケーションを自然にサポートする。プロジェクトはhttps://github.com/zhuxing0/Relit-LiVEで入手可能である。

English

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

Relit-LiVE: 環境ビデオの共同学習による動画の再照明

Relit-LiVE: Relight Video by Jointly Learning Environment Video

要旨

Support