DragAnything：使用實體表示進行任意物體的運動控制

摘要

我們介紹了DragAnything，它利用實體表示來實現可控視頻生成中任何物體的運動控制。與現有的運動控制方法相比，DragAnything提供了幾個優勢。首先，基於軌跡的方法對於互動更加用戶友好，當獲取其他引導信號（例如遮罩、深度圖）需要耗費大量人力時。用戶只需在互動過程中畫一條線（軌跡）。其次，我們的實體表示作為一種開放域嵌入，能夠表示任何物體，實現對各種實體（包括背景）的運動控制。最後，我們的實體表示允許對多個物體進行同時且獨立的運動控制。大量實驗表明，我們的DragAnything在FVD、FID和用戶研究方面實現了最先進的性能，特別是在物體運動控制方面，我們的方法在人類投票方面超過了先前的方法（例如DragNUWA）26%。

English

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw a line (trajectory) during interaction. Secondly, our entity representation serves as an open-domain embedding capable of representing any object, enabling the control of motion for diverse entities, including background. Lastly, our entity representation allows simultaneous and distinct motion control for multiple objects. Extensive experiments demonstrate that our DragAnything achieves state-of-the-art performance for FVD, FID, and User Study, particularly in terms of object motion control, where our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.

DragAnything：使用實體表示進行任意物體的運動控制

DragAnything: Motion Control for Anything using Entity Representation

摘要

Support