多模态生成先验增强的肖像视频编辑
Portrait Video Editing Empowered by Multimodal Generative Priors
September 20, 2024
作者: Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, Juyong Zhang
cs.AI
摘要
我们介绍了PortraitGen,这是一种强大的肖像视频编辑方法,通过多模态提示实现了一致且富有表现力的风格化。传统的肖像视频编辑方法通常难以实现3D和时间一致性,通常在渲染质量和效率上也存在不足。为了解决这些问题,我们将肖像视频帧提升到统一的动态3D高斯场,确保帧间的结构和时间上的一致性。此外,我们设计了一种新颖的神经高斯纹理机制,不仅实现了复杂的风格编辑,还实现了超过100FPS的渲染速度。我们的方法通过从大规模2D生成模型中蒸馏的知识,融合了多模态输入。我们的系统还融合了表情相似性指导和面部感知的肖像编辑模块,有效地缓解了与迭代数据集更新相关的退化问题。大量实验证明了我们方法的时间一致性、编辑效率和优越的渲染质量。所提出方法的广泛适用性通过各种应用得到展示,包括文本驱动编辑、图像驱动编辑和重光,突显了其推动视频编辑领域发展的巨大潜力。我们的项目页面提供了演示视频和发布的代码:https://ustc3dv.github.io/PortraitGen/
English
We introduce PortraitGen, a powerful portrait video editing method that
achieves consistent and expressive stylization with multimodal prompts.
Traditional portrait video editing methods often struggle with 3D and temporal
consistency, and typically lack in rendering quality and efficiency. To address
these issues, we lift the portrait video frames to a unified dynamic 3D
Gaussian field, which ensures structural and temporal coherence across frames.
Furthermore, we design a novel Neural Gaussian Texture mechanism that not only
enables sophisticated style editing but also achieves rendering speed over
100FPS. Our approach incorporates multimodal inputs through knowledge distilled
from large-scale 2D generative models. Our system also incorporates expression
similarity guidance and a face-aware portrait editing module, effectively
mitigating degradation issues associated with iterative dataset updates.
Extensive experiments demonstrate the temporal consistency, editing efficiency,
and superior rendering quality of our method. The broad applicability of the
proposed approach is demonstrated through various applications, including
text-driven editing, image-driven editing, and relighting, highlighting its
great potential to advance the field of video editing. Demo videos and released
code are provided in our project page: https://ustc3dv.github.io/PortraitGen/Summary
AI-Generated Summary