RoPECraft: 궤적 기반 RoPE 최적화를 통한 학습 없이 가능한 디퓨전 트랜스포머 기반 모션 전이

초록

우리는 디퓨전 트랜스포머를 위한 학습이 필요 없는 비디오 모션 전이 방법인 RoPECraft를 제안한다. 이 방법은 회전 위치 임베딩(RoPE)만을 수정하여 동작한다. 먼저 참조 비디오에서 조밀한 광학 흐름을 추출하고, 그 결과로 얻은 모션 오프셋을 사용하여 RoPE의 복소 지수 텐서를 왜곡함으로써 생성 과정에 모션을 효과적으로 인코딩한다. 이러한 임베딩은 디노이징 단계에서 예측된 속도와 목표 속도 간의 궤적 정렬을 통해 플로우 매칭 목적 함수를 사용하여 추가로 최적화된다. 출력이 텍스트 프롬프트에 충실하도록 하고 중복 생성을 방지하기 위해, 참조 비디오의 푸리에 변환 위상 성분을 기반으로 한 정규화 항을 도입하여 위상 각을 매끄러운 매니폴드에 투영함으로써 고주파 아티팩트를 억제한다. 벤치마크 실험 결과, RoPECraft는 최근 발표된 모든 방법을 질적 및 양적으로 능가하는 성능을 보여준다.

English

We propose RoPECraft, a training-free video motion transfer method for diffusion transformers that operates solely by modifying their rotary positional embeddings (RoPE). We first extract dense optical flow from a reference video, and utilize the resulting motion offsets to warp the complex-exponential tensors of RoPE, effectively encoding motion into the generation process. These embeddings are then further optimized during denoising time steps via trajectory alignment between the predicted and target velocities using a flow-matching objective. To keep the output faithful to the text prompt and prevent duplicate generations, we incorporate a regularization term based on the phase components of the reference video's Fourier transform, projecting the phase angles onto a smooth manifold to suppress high-frequency artifacts. Experiments on benchmarks reveal that RoPECraft outperforms all recently published methods, both qualitatively and quantitatively.

RoPECraft: 궤적 기반 RoPE 최적화를 통한 학습 없이 가능한 디퓨전 트랜스포머 기반 모션 전이

RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

초록

Support