DragAnything：使用实体表示进行任意物体的运动控制

摘要

我们介绍了DragAnything，它利用实体表示来实现可控视频生成中任何对象的运动控制。与现有的运动控制方法相比，DragAnything提供了几个优势。首先，基于轨迹的方法对用户更友好，因为获取其他引导信号（例如，蒙版、深度图）需要耗费大量人力。用户只需在交互过程中绘制一条线（轨迹）。其次，我们的实体表示作为一个开放域嵌入，能够表示任何对象，实现对各种实体（包括背景）的运动控制。最后，我们的实体表示允许同时对多个对象进行同时且独立的运动控制。大量实验证明，我们的DragAnything在FVD、FID和用户研究方面实现了最先进的性能，特别是在对象运动控制方面，我们的方法在人类投票中超过了以前的方法（例如DragNUWA）26%。

English

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw a line (trajectory) during interaction. Secondly, our entity representation serves as an open-domain embedding capable of representing any object, enabling the control of motion for diverse entities, including background. Lastly, our entity representation allows simultaneous and distinct motion control for multiple objects. Extensive experiments demonstrate that our DragAnything achieves state-of-the-art performance for FVD, FID, and User Study, particularly in terms of object motion control, where our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.

DragAnything：使用实体表示进行任意物体的运动控制

DragAnything: Motion Control for Anything using Entity Representation

摘要

Support