ChatPaper.aiChatPaper

AnyV2V:一個針對任何影片到影片編輯任務的即插即用框架

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

March 21, 2024
作者: Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen
cs.AI

摘要

影片對影片編輯涉及編輯源影片以及附加控制(如文字提示、主題或風格),以生成符合源影片和提供的控制的新影片。傳統方法受限於特定編輯類型,限制了它們滿足廣泛用戶需求的能力。在本文中,我們介紹了AnyV2V,一個新穎的無需訓練的框架,旨在將影片編輯簡化為兩個主要步驟:(1)利用現成的圖像編輯模型(例如InstructPix2Pix、InstantID等)修改第一幀,(2)利用現有的圖像轉影片生成模型(例如I2VGen-XL)進行DDIM反演和特徵注入。在第一階段,AnyV2V可以插入任何現有的圖像編輯工具,以支持廣泛的影片編輯任務。除了傳統的基於提示的編輯方法外,AnyV2V還可以支持新穎的影片編輯任務,包括基於參考的風格轉移、以主題為驅動的編輯和身份操作,這些是以前的方法無法實現的。在第二階段,AnyV2V可以插入任何現有的圖像轉影片模型,執行DDIM反演和中間特徵注入,以保持與源影片的外觀和運動一致性。在基於提示的編輯上,我們展示AnyV2V在提示對齊上可以比以往最佳方法提高35%,在人類偏好上提高25%。在三個新穎任務上,我們展示AnyV2V也實現了高成功率。我們相信AnyV2V將繼續蓬勃發展,因為它能夠無縫集成快速發展的圖像編輯方法。這種兼容性可以幫助AnyV2V提高其多樣性,以滿足不同用戶需求。
English
Video-to-video editing involves editing a source video along with additional control (such as text prompts, subjects, or styles) to generate a new video that aligns with the source video and the provided control. Traditional methods have been constrained to certain editing types, limiting their ability to meet the wide range of user demands. In this paper, we introduce AnyV2V, a novel training-free framework designed to simplify video editing into two primary steps: (1) employing an off-the-shelf image editing model (e.g. InstructPix2Pix, InstantID, etc) to modify the first frame, (2) utilizing an existing image-to-video generation model (e.g. I2VGen-XL) for DDIM inversion and feature injection. In the first stage, AnyV2V can plug in any existing image editing tools to support an extensive array of video editing tasks. Beyond the traditional prompt-based editing methods, AnyV2V also can support novel video editing tasks, including reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods. In the second stage, AnyV2V can plug in any existing image-to-video models to perform DDIM inversion and intermediate feature injection to maintain the appearance and motion consistency with the source video. On the prompt-based editing, we show that AnyV2V can outperform the previous best approach by 35\% on prompt alignment, and 25\% on human preference. On the three novel tasks, we show that AnyV2V also achieves a high success rate. We believe AnyV2V will continue to thrive due to its ability to seamlessly integrate the fast-evolving image editing methods. Such compatibility can help AnyV2V to increase its versatility to cater to diverse user demands.

Summary

AI-Generated Summary

PDF271December 15, 2024