DragVideo：互動式拖曳式影片編輯

摘要

在視頻中編輯視覺內容仍然是一個艱鉅的挑戰，主要存在兩個問題：1) 直接且易於使用者控制以產生2) 在改變形狀、表情和佈局後不會出現難看的失真和人工痕迹的自然編輯結果。受到最近基於圖像的拖放風格編輯技術DragGAN的啟發，我們通過提出DragVideo來解決上述問題，其中採用類似的拖放風格用戶交互來編輯視頻內容，同時保持時間一致性。受到最近擴散模型（如DragDiffusion）的啟發，DragVideo包含了新穎的Drag-on-Video U-Net（DoVe）編輯方法，該方法通過優化由視頻U-Net生成的擴散視頻潛在變量來實現所需的控制。具體來說，我們使用了特定樣本的LoRA微調和相互自注意控制，以確保從DoVe方法中忠實重建視頻。我們還提供了一系列拖放風格視頻編輯的測試示例，並在各種具有挑戰性的編輯任務（如運動編輯、骨架編輯等）上進行了廣泛實驗，突顯了DragVideo的多功能性和普遍性。我們的代碼，包括DragVideo Web用戶界面，將會被釋出。

English

Editing visual content on videos remains a formidable challenge with two main issues: 1) direct and easy user control to produce 2) natural editing results without unsightly distortion and artifacts after changing shape, expression and layout. Inspired by DragGAN, a recent image-based drag-style editing technique, we address above issues by proposing DragVideo, where a similar drag-style user interaction is adopted to edit video content while maintaining temporal consistency. Empowered by recent diffusion models as in DragDiffusion, DragVideo contains the novel Drag-on-Video U-Net (DoVe) editing method, which optimizes diffused video latents generated by video U-Net to achieve the desired control. Specifically, we use Sample-specific LoRA fine-tuning and Mutual Self-Attention control to ensure faithful reconstruction of video from the DoVe method. We also present a series of testing examples for drag-style video editing and conduct extensive experiments across a wide array of challenging editing tasks, such as motion editing, skeleton editing, etc, underscoring DragVideo's versatility and generality. Our codes including the DragVideo web user interface will be released.

DragVideo：互動式拖曳式影片編輯

DragVideo: Interactive Drag-style Video Editing

摘要

Support