Relit-LiVE：透過共同學習環境影片進行影片重照明

摘要

近期研究进展表明，大规模视频扩散模型可通过先将视频分解为内在场景表示，再在新光照条件下执行前向渲染的方式，被重新用作神经渲染器。尽管前景广阔，该范式从根本上依赖于准确的内在分解，而对于真实世界视频而言，内在分解仍高度不可靠，常导致重光照过程中出现外观失真、材质断裂及时序伪影累积等问题。本文提出Relit-LiVE——一种新型视频重光照框架，能够在无需预先获取相机位姿的前提下，生成物理一致且时序稳定的结果。我们的核心洞见在于显式地将原始参考图像引入渲染流程，使模型能够恢复内在表示中不可避免丢失或损坏的关键场景线索。此外，我们提出一种新颖的环境视频预测方案，在单一扩散过程中同时生成重光照视频及与各相机视角对齐的逐帧环境贴图。这种联合预测强化了几何与光照的对应关系，自然支持动态光照与相机运动，显著提升视频重光照的物理一致性，同时放宽了对已知逐帧相机位姿的要求。大量实验表明，Relit-LiVE在合成与真实世界基准测试中均持续优于当前最先进的视频重光照及神经渲染方法。除重光照外，我们的框架天然支持多种下游应用，包括场景级渲染、材质编辑、物体插入及流式视频重光照。项目页面：https://github.com/zhuxing0/Relit-LiVE。

English

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

Relit-LiVE：透過共同學習環境影片進行影片重照明

Relit-LiVE: Relight Video by Jointly Learning Environment Video

摘要

Support