DragAnything:使用实体表示进行任意物体的运动控制
DragAnything: Motion Control for Anything using Entity Representation
March 12, 2024
作者: Wejia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang
cs.AI
摘要
我们介绍了DragAnything,它利用实体表示来实现可控视频生成中任何对象的运动控制。与现有的运动控制方法相比,DragAnything提供了几个优势。首先,基于轨迹的方法对用户更友好,因为获取其他引导信号(例如,蒙版、深度图)需要耗费大量人力。用户只需在交互过程中绘制一条线(轨迹)。其次,我们的实体表示作为一个开放域嵌入,能够表示任何对象,实现对各种实体(包括背景)的运动控制。最后,我们的实体表示允许同时对多个对象进行同时且独立的运动控制。大量实验证明,我们的DragAnything在FVD、FID和用户研究方面实现了最先进的性能,特别是在对象运动控制方面,我们的方法在人类投票中超过了以前的方法(例如DragNUWA)26%。
English
We introduce DragAnything, which utilizes a entity representation to achieve
motion control for any object in controllable video generation. Comparison to
existing motion control methods, DragAnything offers several advantages.
Firstly, trajectory-based is more userfriendly for interaction, when acquiring
other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only
need to draw a line (trajectory) during interaction. Secondly, our entity
representation serves as an open-domain embedding capable of representing any
object, enabling the control of motion for diverse entities, including
background. Lastly, our entity representation allows simultaneous and distinct
motion control for multiple objects. Extensive experiments demonstrate that our
DragAnything achieves state-of-the-art performance for FVD, FID, and User
Study, particularly in terms of object motion control, where our method
surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.