VideoSwap：利用交互式語義點對應進行定制視頻主題交換

摘要

目前基於擴散的影片編輯主要專注於結構保留編輯，透過利用各種密集對應來確保時間一致性和動作對齊。然而，這些方法在目標編輯涉及形狀變化時通常效果不佳。為了進行具有形狀變化的影片編輯，我們在這項工作中探索了定制的影片主題交換，旨在將源影片中的主題替換為具有獨特身份和可能不同形狀的目標主題。與依賴密集對應的先前方法相反，我們引入了VideoSwap框架，該框架利用語義點對應，靈感來自我們的觀察，即只有少量語義點是必要的，以對齊主題的運動軌跡並修改其形狀。我們還引入了各種用戶點交互（例如，刪除點和拖動點）來應對各種語義點對應。廣泛的實驗證明，在各種現實世界影片中，我們的VideoSwap框架實現了最先進的影片主題交換結果。

English

Current diffusion-based video editing primarily focuses on structure-preserved editing by utilizing various dense correspondences to ensure temporal consistency and motion alignment. However, these approaches are often ineffective when the target edit involves a shape change. To embark on video editing with shape change, we explore customized video subject swapping in this work, where we aim to replace the main subject in a source video with a target subject having a distinct identity and potentially different shape. In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape. We also introduce various user-point interactions (\eg, removing points and dragging points) to address various semantic point correspondence. Extensive experiments demonstrate state-of-the-art video subject swapping results across a variety of real-world videos.

VideoSwap：利用交互式語義點對應進行定制視頻主題交換

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

摘要

Support