AnyV2V:一個針對任何影片到影片編輯任務的即插即用框架
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
March 21, 2024
作者: Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen
cs.AI
摘要
影片對影片編輯涉及編輯源影片以及附加控制(如文字提示、主題或風格),以生成符合源影片和提供的控制的新影片。傳統方法受限於特定編輯類型,限制了它們滿足廣泛用戶需求的能力。在本文中,我們介紹了AnyV2V,一個新穎的無需訓練的框架,旨在將影片編輯簡化為兩個主要步驟:(1)利用現成的圖像編輯模型(例如InstructPix2Pix、InstantID等)修改第一幀,(2)利用現有的圖像轉影片生成模型(例如I2VGen-XL)進行DDIM反演和特徵注入。在第一階段,AnyV2V可以插入任何現有的圖像編輯工具,以支持廣泛的影片編輯任務。除了傳統的基於提示的編輯方法外,AnyV2V還可以支持新穎的影片編輯任務,包括基於參考的風格轉移、以主題為驅動的編輯和身份操作,這些是以前的方法無法實現的。在第二階段,AnyV2V可以插入任何現有的圖像轉影片模型,執行DDIM反演和中間特徵注入,以保持與源影片的外觀和運動一致性。在基於提示的編輯上,我們展示AnyV2V在提示對齊上可以比以往最佳方法提高35%,在人類偏好上提高25%。在三個新穎任務上,我們展示AnyV2V也實現了高成功率。我們相信AnyV2V將繼續蓬勃發展,因為它能夠無縫集成快速發展的圖像編輯方法。這種兼容性可以幫助AnyV2V提高其多樣性,以滿足不同用戶需求。
English
Video-to-video editing involves editing a source video along with additional
control (such as text prompts, subjects, or styles) to generate a new video
that aligns with the source video and the provided control. Traditional methods
have been constrained to certain editing types, limiting their ability to meet
the wide range of user demands. In this paper, we introduce AnyV2V, a novel
training-free framework designed to simplify video editing into two primary
steps: (1) employing an off-the-shelf image editing model (e.g.
InstructPix2Pix, InstantID, etc) to modify the first frame, (2) utilizing an
existing image-to-video generation model (e.g. I2VGen-XL) for DDIM inversion
and feature injection. In the first stage, AnyV2V can plug in any existing
image editing tools to support an extensive array of video editing tasks.
Beyond the traditional prompt-based editing methods, AnyV2V also can support
novel video editing tasks, including reference-based style transfer,
subject-driven editing, and identity manipulation, which were unattainable by
previous methods. In the second stage, AnyV2V can plug in any existing
image-to-video models to perform DDIM inversion and intermediate feature
injection to maintain the appearance and motion consistency with the source
video. On the prompt-based editing, we show that AnyV2V can outperform the
previous best approach by 35\% on prompt alignment, and 25\% on human
preference. On the three novel tasks, we show that AnyV2V also achieves a high
success rate. We believe AnyV2V will continue to thrive due to its ability to
seamlessly integrate the fast-evolving image editing methods. Such
compatibility can help AnyV2V to increase its versatility to cater to diverse
user demands.Summary
AI-Generated Summary