DragAnything: 엔티티 표현을 활용한 모든 대상의 모션 제어

초록

우리는 DragAnything을 소개하며, 이는 엔티티 표현을 활용하여 제어 가능한 비디오 생성에서 모든 객체의 움직임 제어를 달성합니다. 기존의 움직임 제어 방법과 비교하여 DragAnything은 몇 가지 장점을 제공합니다. 첫째, 궤적 기반 방식은 다른 지시 신호(예: 마스크, 깊이 맵)를 획득하는 것이 노동 집약적인 경우에 사용자 상호작용에 더 친화적입니다. 사용자는 상호작용 중에 단순히 선(궤적)을 그리기만 하면 됩니다. 둘째, 우리의 엔티티 표현은 모든 객체를 표현할 수 있는 오픈 도메인 임베딩으로 작동하며, 배경을 포함한 다양한 엔티티의 움직임 제어를 가능하게 합니다. 마지막으로, 우리의 엔티티 표현은 여러 객체에 대해 동시적이고 독립적인 움직임 제어를 허용합니다. 광범위한 실험을 통해 DragAnything이 FVD, FID 및 사용자 연구에서 최신 기술을 달성하며, 특히 객체 움직임 제어에서 이전 방법(예: DragNUWA)보다 26% 더 높은 인간 투표율을 보여줌을 입증했습니다.

English

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw a line (trajectory) during interaction. Secondly, our entity representation serves as an open-domain embedding capable of representing any object, enabling the control of motion for diverse entities, including background. Lastly, our entity representation allows simultaneous and distinct motion control for multiple objects. Extensive experiments demonstrate that our DragAnything achieves state-of-the-art performance for FVD, FID, and User Study, particularly in terms of object motion control, where our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.

DragAnything: 엔티티 표현을 활용한 모든 대상의 모션 제어

DragAnything: Motion Control for Anything using Entity Representation

초록

Support