VideoSwap: 인터랙티브 의미론적 포인트 대응을 통한 맞춤형 비디오 주체 교체

초록

현재의 확산 기반 비디오 편집은 주로 다양한 밀집 대응 관계를 활용하여 시간적 일관성과 움직임 정렬을 보장하는 구조 보존 편집에 초점을 맞추고 있습니다. 그러나 이러한 접근 방식은 대상 편집이 형태 변화를 포함할 경우 종종 비효율적입니다. 형태 변화를 수반하는 비디오 편집을 시작하기 위해, 본 연구에서는 소스 비디오의 주요 주체를 고유한 정체성과 잠재적으로 다른 형태를 가진 대상 주체로 교체하는 맞춤형 비디오 주체 교체를 탐구합니다. 밀집 대응 관계에 의존하는 기존 방법과 달리, 우리는 주체의 움직임 궤적을 정렬하고 형태를 수정하기 위해 소수의 의미론적 포인트만 필요하다는 관찰에서 영감을 받아 의미론적 포인트 대응 관계를 활용하는 VideoSwap 프레임워크를 소개합니다. 또한 다양한 의미론적 포인트 대응 관계를 해결하기 위해 사용자 포인트 상호작용(예: 포인트 제거 및 포인트 드래그)을 도입합니다. 광범위한 실험을 통해 다양한 실제 비디오에서 최신 수준의 비디오 주체 교체 결과를 입증합니다.

English

Current diffusion-based video editing primarily focuses on structure-preserved editing by utilizing various dense correspondences to ensure temporal consistency and motion alignment. However, these approaches are often ineffective when the target edit involves a shape change. To embark on video editing with shape change, we explore customized video subject swapping in this work, where we aim to replace the main subject in a source video with a target subject having a distinct identity and potentially different shape. In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape. We also introduce various user-point interactions (\eg, removing points and dragging points) to address various semantic point correspondence. Extensive experiments demonstrate state-of-the-art video subject swapping results across a variety of real-world videos.

VideoSwap: 인터랙티브 의미론적 포인트 대응을 통한 맞춤형 비디오 주체 교체

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

초록

Support