Bewegingsaansturing: Het controleren van videogeneratie met bewegingstrajecten

Samenvatting

Bewegingsbesturing is cruciaal voor het genereren van expressieve en boeiende videobeelden; echter vertrouwen de meeste bestaande videogeneratiemodellen voornamelijk op tekstprompts voor de besturing, die moeite hebben om de nuances van dynamische acties en temporele composities vast te leggen. Met dit doel trainen we een videogeneratiemodel dat geconditioneerd is op spaat-temporeel schaarse of dichte bewegingstrajecten. In tegenstelling tot eerdere bewegingsconditioneringswerk, kan deze flexibele representatie elk aantal trajecten, object-specifieke of globale scènebeweging, en temporeel schaarse beweging coderen; vanwege zijn flexibiliteit verwijzen we naar deze conditionering als bewegingsprompts. Hoewel gebruikers schaarse trajecten direct kunnen specificeren, laten we ook zien hoe we hoog-niveau gebruikersverzoeken kunnen vertalen naar gedetailleerde, semi-dichte bewegingsprompts, een proces dat we bewegingspromptuitbreiding noemen. We tonen de veelzijdigheid van onze aanpak aan via verschillende toepassingen, waaronder camera- en objectbewegingsbesturing, "interactie" met een afbeelding, bewegingsoverdracht en beeldbewerking. Onze resultaten tonen opkomende gedragingen, zoals realistische natuurkunde, wat wijst op het potentieel van bewegingsprompts voor het onderzoeken van videomodellen en interactie met toekomstige generatieve wereldmodellen. Tot slot evalueren we kwantitatief, voeren we een menselijke studie uit, en tonen we sterke prestaties aan. Videoreusltaten zijn beschikbaar op onze webpagina: https://motion-prompting.github.io/

English

Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatio-temporally sparse or dense motion trajectories. In contrast to prior motion conditioning work, this flexible representation can encode any number of trajectories, object-specific or global scene motion, and temporally sparse motion; due to its flexibility we refer to this conditioning as motion prompts. While users may directly specify sparse trajectories, we also show how to translate high-level user requests into detailed, semi-dense motion prompts, a process we term motion prompt expansion. We demonstrate the versatility of our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing. Our results showcase emergent behaviors, such as realistic physics, suggesting the potential of motion prompts for probing video models and interacting with future generative world models. Finally, we evaluate quantitatively, conduct a human study, and demonstrate strong performance. Video results are available on our webpage: https://motion-prompting.github.io/

Bewegingsaansturing: Het controleren van videogeneratie met bewegingstrajecten

Motion Prompting: Controlling Video Generation with Motion Trajectories

Samenvatting

Support