ControlVideo:為單次文本到視頻編輯添加條件控制
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing
May 26, 2023
作者: Min Zhao, Rongzhen Wang, Fan Bao, Chongxuan Li, Jun Zhu
cs.AI
摘要
本文介紹了ControlVideo,這是一種用於以文字驅動的視頻編輯的新方法。利用文本到圖像擴散模型和ControlNet的能力,ControlVideo旨在增強與給定文本對齊的視頻的保真度和時間一致性,同時保留源視頻的結構。通過將額外條件如邊緣地圖納入其中,通過精心設計的策略在源視頻-文本對上進行關鍵幀和時間注意的微調,實現了這一目標。對ControlVideo設計的深入探討有助於未來研究單次調整視頻擴散模型。從定量上看,ControlVideo在保真度和一致性方面優於一系列競爭基線,同時仍與文本提示保持一致。此外,它提供了具有高視覺逼真度和與源內容相符的視頻,展示了在利用包含不同程度源視頻信息的控制時的靈活性,以及多種控制組合的潛力。項目頁面位於https://ml.cs.tsinghua.edu.cn/controlvideo/。
English
In this paper, we present ControlVideo, a novel method for text-driven video
editing. Leveraging the capabilities of text-to-image diffusion models and
ControlNet, ControlVideo aims to enhance the fidelity and temporal consistency
of videos that align with a given text while preserving the structure of the
source video. This is achieved by incorporating additional conditions such as
edge maps, fine-tuning the key-frame and temporal attention on the source
video-text pair with carefully designed strategies. An in-depth exploration of
ControlVideo's design is conducted to inform future research on one-shot tuning
video diffusion models. Quantitatively, ControlVideo outperforms a range of
competitive baselines in terms of faithfulness and consistency while still
aligning with the textual prompt. Additionally, it delivers videos with high
visual realism and fidelity w.r.t. the source content, demonstrating
flexibility in utilizing controls containing varying degrees of source video
information, and the potential for multiple control combinations. The project
page is available at
https://ml.cs.tsinghua.edu.cn/controlvideo/{https://ml.cs.tsinghua.edu.cn/controlvideo/}.