ChatPaper.aiChatPaper

VIA:一种用于全局和局部视频编辑的时空视频自适应框架

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

June 18, 2024
作者: Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang
cs.AI

摘要

视频编辑是数字媒体的基石,涵盖娱乐、教育和专业交流等领域。然而,先前的方法往往忽视全局和局部背景的全面理解的必要性,导致时空维度上的编辑不准确和不一致,尤其是对于长视频而言。在本文中,我们介绍了VIA,一个统一的时空视频适应框架,用于全局和局部视频编辑,推动了对长达一分钟视频进行一致编辑的极限。首先,为了确保单个帧内的局部一致性,VIA的基础是一种新颖的测试时编辑适应方法,该方法调整了预训练的图像编辑模型,以提高潜在编辑方向与文本指令之间的一致性,并调整了掩码潜变量以实现精确的局部控制。此外,为了在整个视频序列上保持全局一致性,我们引入了时空适应,该方法调整了关键帧中的一致性注意力变量,并在整个序列中策略性地应用它们以实现编辑效果。大量实验证明,与基线方法相比,我们的VIA方法产生的编辑更忠实于原始视频,在时空上更连贯,并在局部控制上更精确。更重要的是,我们展示了VIA可以在几分钟内实现一致的长视频编辑,释放了在长视频序列上进行高级视频编辑任务的潜力。
English
Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemporal VIdeo Adaptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos. First, to ensure local consistency within individual frames, the foundation of VIA is a novel test-time editing adaptation method, which adapts a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapts masked latent variables for precise local control. Furthermore, to maintain global consistency over the video sequence, we introduce spatiotemporal adaptation that adapts consistent attention variables in key frames and strategically applies them across the whole sequence to realize the editing effects. Extensive experiments demonstrate that, compared to baseline methods, our VIA approach produces edits that are more faithful to the source videos, more coherent in the spatiotemporal context, and more precise in local control. More importantly, we show that VIA can achieve consistent long video editing in minutes, unlocking the potentials for advanced video editing tasks over long video sequences.

Summary

AI-Generated Summary

PDF51December 4, 2024