Relit-LiVE: 환경 비디오 공동 학습을 통한 비디오 재조명

초록

최근 연구들은 대규모 비디오 확산 모델을 신경망 렌더러로 활용할 수 있음을 보여주었는데, 이를 위해 먼저 비디오를 본질적 장면 표현으로 분해한 후 새로운 조명 하에서 순방향 렌더링을 수행한다. 이 패러다임은 유망하지만, 정확한 본질적 분해에 근본적으로 의존하며, 이는 실제 비디오에 대해 매우 신뢰성이 낮아 재조명 과정에서 왜곡된 외관, 손상된 재질, 누적된 시간적 아티팩트를 초래하는 경우가 많다. 본 연구에서는 카메라 포즈에 대한 사전 정보 없이도 물리적으로 일관되고 시간적으로 안정적인 결과를 생성하는 새로운 비디오 재조명 프레임워크인 Relit-LiVE를 제안한다. 핵심 통찰은 렌더링 과정에 원본 참조 이미지를 명시적으로 도입하여 본질적 표현에서 필연적으로 손실되거나 손상되는 중요한 장면 신호를 모델이 복원할 수 있게 하는 것이다. 또한, 단일 확산 과정에서 재조명된 비디오와 각 카메라 시점에 정렬된 프레임별 환경 맵을 동시에 생성하는 새로운 환경 비디오 예측 공식을 제안한다. 이 결합 예측은 강력한 기하-조명 정렬을 강제하고 동적 조명과 카메라 움직임을 자연스럽게 지원함으로써 비디오 재조명의 물리적 일관성을 크게 개선하는 동시에 알려진 프레임별 카메라 포즈에 대한 요구를 완화한다. 광범위한 실험을 통해 Relit-LiVE가 합성 및 실제 세계 벤치마크에서 최신 비디오 재조명 및 신경망 렌더링 방법을 지속적으로 능가함을 입증한다. 또한, 본 프레임워크는 재조명 외에도 장면 수준 렌더링, 재질 편집, 객체 삽입, 스트리밍 비디오 재조명을 포함한 다양한 하위 응용 프로그램을 자연스럽게 지원한다. 프로젝트는 https://github.com/zhuxing0/Relit-LiVE 에서 확인할 수 있다.

English

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

Relit-LiVE: 환경 비디오 공동 학습을 통한 비디오 재조명

Relit-LiVE: Relight Video by Jointly Learning Environment Video

초록

Support