ChatPaper.aiChatPaper

V-RGBX:基於內在屬性精準控制的影片編輯技術

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

December 12, 2025
作者: Ye Fang, Tong Wu, Valentin Deschaintre, Duygu Ceylan, Iliyan Georgiev, Chun-Hao Paul Huang, Yiwei Hu, Xuelin Chen, Tuanfeng Yang Wang
cs.AI

摘要

大規模影片生成模型在模擬真實場景中的照片級外觀與光照互動方面展現出卓越潛力。然而,能夠同時理解場景本徵屬性(如反照率、法線、材質和輻照度)、利用這些屬性進行影片合成,並支持可編輯本徵表徵的閉環框架仍有待探索。我們提出V-RGBX——首個面向本徵感知影片編輯的端到端框架。V-RGBX整合了三項核心能力:(1) 將影片逆向渲染為本徵通道,(2) 基於本徵表徵進行照片級影片合成,(3) 以本徵通道為條件的關鍵影格影片編輯。該框架的核心在於交錯條件機制,通過用戶選取的關鍵影格實現直觀且符合物理規律的影片編輯,支持對任意本徵模態的靈活操控。大量定性和定量結果表明,V-RGBX能生成時序一致的照片級影片,同時以物理合理的方式將關鍵影格編輯效果傳播至整個序列。我們通過物體外觀編輯與場景級重照明等多樣化應用驗證其效能,其表現超越現有方法。
English
Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages them for video synthesis, and supports editable intrinsic representations remains unexplored. We present V-RGBX, the first end-to-end framework for intrinsic-aware video editing. V-RGBX unifies three key capabilities: (1) video inverse rendering into intrinsic channels, (2) photorealistic video synthesis from these intrinsic representations, and (3) keyframe-based video editing conditioned on intrinsic channels. At the core of V-RGBX is an interleaved conditioning mechanism that enables intuitive, physically grounded video editing through user-selected keyframes, supporting flexible manipulation of any intrinsic modality. Extensive qualitative and quantitative results show that V-RGBX produces temporally consistent, photorealistic videos while propagating keyframe edits across sequences in a physically plausible manner. We demonstrate its effectiveness in diverse applications, including object appearance editing and scene-level relighting, surpassing the performance of prior methods.
PDF292December 17, 2025