VideoSwap：通过交互式语义点对应实现定制视频主体交换

摘要

当前基于扩散的视频编辑主要侧重于通过利用各种密集对应关系来实现结构保留编辑，以确保时间一致性和运动对齐。然而，当目标编辑涉及形状变化时，这些方法通常效果不佳。为了开始进行具有形状变化的视频编辑，我们在这项工作中探讨了定制视频主体交换，旨在将源视频中的主体替换为具有独特身份和可能不同形状的目标主体。与依赖密集对应关系的先前方法相比，我们引入了VideoSwap框架，该框架利用语义点对应关系，灵感来自我们的观察，即只有少量语义点是必要的，以对齐主体的运动轨迹并修改其形状。我们还引入了各种用户点交互（例如，删除点和拖动点）来处理各种语义点对应关系。大量实验证明，在各种真实世界视频中，我们的视频主体交换结果达到了最先进的水平。

English

Current diffusion-based video editing primarily focuses on structure-preserved editing by utilizing various dense correspondences to ensure temporal consistency and motion alignment. However, these approaches are often ineffective when the target edit involves a shape change. To embark on video editing with shape change, we explore customized video subject swapping in this work, where we aim to replace the main subject in a source video with a target subject having a distinct identity and potentially different shape. In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape. We also introduce various user-point interactions (\eg, removing points and dragging points) to address various semantic point correspondence. Extensive experiments demonstrate state-of-the-art video subject swapping results across a variety of real-world videos.

VideoSwap：通过交互式语义点对应实现定制视频主体交换

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

摘要

Support