Relit-LiVE：通过联合学习环境视频进行视频重光照

摘要

近期研究表明，大规模视频扩散模型可通过先分解视频为内在场景表示、再在新光照条件下执行前向渲染，被重新用作神经渲染器。尽管前景广阔，但该范式从根本上依赖于精确的内在分解，而这在真实世界视频中仍高度不可靠，常导致重光照过程中出现外观扭曲、材质破碎及时间伪影累积等问题。本文提出Relit-LiVE，一种新颖的视频重光照框架，无需相机位姿先验知识即可生成物理一致且时间稳定的结果。我们的核心洞察在于将原始参考图像显式引入渲染过程，使模型能够恢复在内在表示中必然丢失或损坏的关键场景线索。此外，我们提出一种新型环境视频预测公式，在单个扩散过程中同时生成重光照视频和与各相机视角对齐的逐帧环境贴图。这种联合预测强化了几何-光照一致性，自然支持动态光照与相机运动，显著提升视频重光照的物理一致性，同时放宽了对已知逐帧相机位姿的要求。大量实验表明，Relit-LiVE在合成与真实世界基准测试中始终优于最先进的视频重光照和神经渲染方法。除重光照外，本框架还可自然支持一系列下游应用，包括场景级渲染、材质编辑、物体插入及流式视频重光照。项目地址：https://github.com/zhuxing0/Relit-LiVE。

English

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

Relit-LiVE：通过联合学习环境视频进行视频重光照

Relit-LiVE: Relight Video by Jointly Learning Environment Video

摘要

Support