V-RGBX:基于内在属性精准控制的视频编辑技术
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
December 12, 2025
作者: Ye Fang, Tong Wu, Valentin Deschaintre, Duygu Ceylan, Iliyan Georgiev, Chun-Hao Paul Huang, Yiwei Hu, Xuelin Chen, Tuanfeng Yang Wang
cs.AI
摘要
大规模视频生成模型在模拟真实场景的光照交互与外观细节方面展现出巨大潜力。然而,能够同时理解场景本征属性(如反照率、法线、材质和辐照度)、基于这些属性进行视频合成,并支持可编辑本征表征的闭环框架仍属空白。我们提出V-RGBX——首个面向本征感知视频编辑的端到端框架。该框架集成了三大核心能力:(1)将视频逆向渲染为本征通道;(2)基于本征表征进行逼真视频合成;(3)支持以本征通道为条件的关键帧视频编辑。V-RGBX的核心在于交错式条件控制机制,通过用户选择的关键帧实现符合物理规律的直观视频编辑,支持对任意本征模态的灵活操控。大量定性与定量结果表明,V-RGBX能生成时序一致、逼真度高的视频,并以符合物理规律的方式将关键帧编辑效果传播至整个序列。我们在物体外观编辑、场景级重光照等多样化应用中验证了其有效性,其性能显著超越了现有方法。
English
Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages them for video synthesis, and supports editable intrinsic representations remains unexplored. We present V-RGBX, the first end-to-end framework for intrinsic-aware video editing. V-RGBX unifies three key capabilities: (1) video inverse rendering into intrinsic channels, (2) photorealistic video synthesis from these intrinsic representations, and (3) keyframe-based video editing conditioned on intrinsic channels. At the core of V-RGBX is an interleaved conditioning mechanism that enables intuitive, physically grounded video editing through user-selected keyframes, supporting flexible manipulation of any intrinsic modality. Extensive qualitative and quantitative results show that V-RGBX produces temporally consistent, photorealistic videos while propagating keyframe edits across sequences in a physically plausible manner. We demonstrate its effectiveness in diverse applications, including object appearance editing and scene-level relighting, surpassing the performance of prior methods.