DragAnything:使用實體表示進行任意物體的運動控制
DragAnything: Motion Control for Anything using Entity Representation
March 12, 2024
作者: Wejia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang
cs.AI
摘要
我們介紹了DragAnything,它利用實體表示來實現可控視頻生成中任何物體的運動控制。與現有的運動控制方法相比,DragAnything提供了幾個優勢。首先,基於軌跡的方法對於互動更加用戶友好,當獲取其他引導信號(例如遮罩、深度圖)需要耗費大量人力時。用戶只需在互動過程中畫一條線(軌跡)。其次,我們的實體表示作為一種開放域嵌入,能夠表示任何物體,實現對各種實體(包括背景)的運動控制。最後,我們的實體表示允許對多個物體進行同時且獨立的運動控制。大量實驗表明,我們的DragAnything在FVD、FID和用戶研究方面實現了最先進的性能,特別是在物體運動控制方面,我們的方法在人類投票方面超過了先前的方法(例如DragNUWA)26%。
English
We introduce DragAnything, which utilizes a entity representation to achieve
motion control for any object in controllable video generation. Comparison to
existing motion control methods, DragAnything offers several advantages.
Firstly, trajectory-based is more userfriendly for interaction, when acquiring
other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only
need to draw a line (trajectory) during interaction. Secondly, our entity
representation serves as an open-domain embedding capable of representing any
object, enabling the control of motion for diverse entities, including
background. Lastly, our entity representation allows simultaneous and distinct
motion control for multiple objects. Extensive experiments demonstrate that our
DragAnything achieves state-of-the-art performance for FVD, FID, and User
Study, particularly in terms of object motion control, where our method
surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.Summary
AI-Generated Summary