DragVideo:互動式拖曳式影片編輯
DragVideo: Interactive Drag-style Video Editing
December 3, 2023
作者: Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang
cs.AI
摘要
在視頻中編輯視覺內容仍然是一個艱鉅的挑戰,主要存在兩個問題:1) 直接且易於使用者控制以產生2) 在改變形狀、表情和佈局後不會出現難看的失真和人工痕迹的自然編輯結果。受到最近基於圖像的拖放風格編輯技術DragGAN的啟發,我們通過提出DragVideo來解決上述問題,其中採用類似的拖放風格用戶交互來編輯視頻內容,同時保持時間一致性。受到最近擴散模型(如DragDiffusion)的啟發,DragVideo包含了新穎的Drag-on-Video U-Net(DoVe)編輯方法,該方法通過優化由視頻U-Net生成的擴散視頻潛在變量來實現所需的控制。具體來說,我們使用了特定樣本的LoRA微調和相互自注意控制,以確保從DoVe方法中忠實重建視頻。我們還提供了一系列拖放風格視頻編輯的測試示例,並在各種具有挑戰性的編輯任務(如運動編輯、骨架編輯等)上進行了廣泛實驗,突顯了DragVideo的多功能性和普遍性。我們的代碼,包括DragVideo Web用戶界面,將會被釋出。
English
Editing visual content on videos remains a formidable challenge with two main
issues: 1) direct and easy user control to produce 2) natural editing results
without unsightly distortion and artifacts after changing shape, expression and
layout. Inspired by DragGAN, a recent image-based drag-style editing technique,
we address above issues by proposing DragVideo, where a similar drag-style user
interaction is adopted to edit video content while maintaining temporal
consistency. Empowered by recent diffusion models as in DragDiffusion,
DragVideo contains the novel Drag-on-Video U-Net (DoVe) editing method, which
optimizes diffused video latents generated by video U-Net to achieve the
desired control. Specifically, we use Sample-specific LoRA fine-tuning and
Mutual Self-Attention control to ensure faithful reconstruction of video from
the DoVe method. We also present a series of testing examples for drag-style
video editing and conduct extensive experiments across a wide array of
challenging editing tasks, such as motion editing, skeleton editing, etc,
underscoring DragVideo's versatility and generality. Our codes including the
DragVideo web user interface will be released.