AnyV2V:一种适用于任何视频到视频编辑任务的即插即用框架
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
March 21, 2024
作者: Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen
cs.AI
摘要
视频到视频编辑涉及编辑源视频以及额外控制(如文本提示、主题或风格),生成一个与源视频和提供的控制相一致的新视频。传统方法受限于特定编辑类型,限制了它们满足广泛用户需求的能力。在本文中,我们介绍了AnyV2V,这是一个新颖的无需训练的框架,旨在将视频编辑简化为两个主要步骤:(1)利用现成的图像编辑模型(例如InstructPix2Pix、InstantID等)修改第一帧,(2)利用现有的图像到视频生成模型(例如I2VGen-XL)进行DDIM反演和特征注入。在第一阶段,AnyV2V可以插入任何现有的图像编辑工具,以支持广泛的视频编辑任务。除了传统的基于提示的编辑方法,AnyV2V还可以支持新颖的视频编辑任务,包括基于参考的风格转移、以主题驱动的编辑和身份操纵,这些是以前的方法无法实现的。在第二阶段,AnyV2V可以插入任何现有的图像到视频模型,执行DDIM反演和中间特征注入,以保持与源视频的外观和运动一致性。在基于提示的编辑方面,我们展示AnyV2V在提示对齐上优于先前的最佳方法35\%,在人类偏好方面优于25\%。在三个新颖任务上,我们展示AnyV2V也取得了高成功率。我们相信AnyV2V将继续蓬勃发展,因为它能够无缝集成快速发展的图像编辑方法。这种兼容性可以帮助AnyV2V增加其多样性,以满足各种用户需求。
English
Video-to-video editing involves editing a source video along with additional
control (such as text prompts, subjects, or styles) to generate a new video
that aligns with the source video and the provided control. Traditional methods
have been constrained to certain editing types, limiting their ability to meet
the wide range of user demands. In this paper, we introduce AnyV2V, a novel
training-free framework designed to simplify video editing into two primary
steps: (1) employing an off-the-shelf image editing model (e.g.
InstructPix2Pix, InstantID, etc) to modify the first frame, (2) utilizing an
existing image-to-video generation model (e.g. I2VGen-XL) for DDIM inversion
and feature injection. In the first stage, AnyV2V can plug in any existing
image editing tools to support an extensive array of video editing tasks.
Beyond the traditional prompt-based editing methods, AnyV2V also can support
novel video editing tasks, including reference-based style transfer,
subject-driven editing, and identity manipulation, which were unattainable by
previous methods. In the second stage, AnyV2V can plug in any existing
image-to-video models to perform DDIM inversion and intermediate feature
injection to maintain the appearance and motion consistency with the source
video. On the prompt-based editing, we show that AnyV2V can outperform the
previous best approach by 35\% on prompt alignment, and 25\% on human
preference. On the three novel tasks, we show that AnyV2V also achieves a high
success rate. We believe AnyV2V will continue to thrive due to its ability to
seamlessly integrate the fast-evolving image editing methods. Such
compatibility can help AnyV2V to increase its versatility to cater to diverse
user demands.Summary
AI-Generated Summary