重新視頻:使用動態和內容控制重新製作視頻
ReVideo: Remake a Video with Motion and Content Control
May 22, 2024
作者: Chong Mou, Mingdeng Cao, Xintao Wang, Zhaoyang Zhang, Ying Shan, Jian Zhang
cs.AI
摘要
儘管擴散模型在影片生成和編輯方面取得了顯著進展,實現準確且局部化的影片編輯仍然是一個重大挑戰。此外,大多數現有的影片編輯方法主要集中在改變視覺內容,對於動作編輯的研究有限。本文提出了一種新的嘗試,即重新製作影片(ReVideo),與現有方法有所不同,它通過指定內容和動作,實現對特定區域進行精確的影片編輯。通過修改第一幀來實現內容編輯,而基於軌跡的運動控制提供了直觀的用戶交互體驗。ReVideo解決了一個新的任務,涉及內容和運動控制之間的耦合和訓練不平衡。為了應對這一挑戰,我們制定了一個三階段的訓練策略,從粗糙到精細逐步解耦這兩個方面。此外,我們提出了一個時空自適應融合模塊,以在各種採樣步驟和空間位置上整合內容和運動控制。大量實驗表明,我們的ReVideo在幾個準確的影片編輯應用上表現出有希望的性能,即(1)在保持運動恆定的情況下局部更改影片內容,(2)保持內容不變並自定義新的運動軌跡,(3)修改內容和運動軌跡。我們的方法還可以無縫擴展這些應用到多區域編輯,而無需特定訓練,展示了其靈活性和韌性。
English
Despite significant advancements in video generation and editing using
diffusion models, achieving accurate and localized video editing remains a
substantial challenge. Additionally, most existing video editing methods
primarily focus on altering visual content, with limited research dedicated to
motion editing. In this paper, we present a novel attempt to Remake a Video
(ReVideo) which stands out from existing methods by allowing precise video
editing in specific areas through the specification of both content and motion.
Content editing is facilitated by modifying the first frame, while the
trajectory-based motion control offers an intuitive user interaction
experience. ReVideo addresses a new task involving the coupling and training
imbalance between content and motion control. To tackle this, we develop a
three-stage training strategy that progressively decouples these two aspects
from coarse to fine. Furthermore, we propose a spatiotemporal adaptive fusion
module to integrate content and motion control across various sampling steps
and spatial locations. Extensive experiments demonstrate that our ReVideo has
promising performance on several accurate video editing applications, i.e., (1)
locally changing video content while keeping the motion constant, (2) keeping
content unchanged and customizing new motion trajectories, (3) modifying both
content and motion trajectories. Our method can also seamlessly extend these
applications to multi-area editing without specific training, demonstrating its
flexibility and robustness.Summary
AI-Generated Summary