VideoSwap: インタラクティブなセマンティックポイント対応によるカスタマイズされた映像被写体交換

要旨

現在の拡散モデルに基づく動画編集は、主に時間的な一貫性とモーションの整合性を保証するために、様々な密な対応関係を利用した構造保存型編集に焦点を当てています。しかし、これらのアプローチは、編集対象が形状変化を伴う場合にはしばしば効果的ではありません。形状変化を伴う動画編集に取り組むため、本論文ではカスタマイズされた動画被写体交換を探求します。ここでは、ソース動画の主要被写体を、異なるアイデンティティと潜在的に異なる形状を持つターゲット被写体に置き換えることを目指します。密な対応関係に依存する従来の手法とは対照的に、我々はVideoSwapフレームワークを提案します。このフレームワークは、被写体のモーショントラジェクトリを整列させ、その形状を変更するためには少数の意味的ポイントだけで十分であるという観察に基づいて、意味的ポイント対応関係を活用します。また、様々な意味的ポイント対応関係に対処するために、ユーザーポイントインタラクション（例えば、ポイントの削除やドラッグ）を導入します。広範な実験により、様々な実世界の動画において最先端の動画被写体交換結果が実証されています。

English

Current diffusion-based video editing primarily focuses on structure-preserved editing by utilizing various dense correspondences to ensure temporal consistency and motion alignment. However, these approaches are often ineffective when the target edit involves a shape change. To embark on video editing with shape change, we explore customized video subject swapping in this work, where we aim to replace the main subject in a source video with a target subject having a distinct identity and potentially different shape. In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape. We also introduce various user-point interactions (\eg, removing points and dragging points) to address various semantic point correspondence. Extensive experiments demonstrate state-of-the-art video subject swapping results across a variety of real-world videos.

VideoSwap: インタラクティブなセマンティックポイント対応によるカスタマイズされた映像被写体交換

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

要旨

Support