OmniLottie: Vectoranimaties genereren via geparametriseerde Lottie-tokens

Samenvatting

OmniLottie is een veelzijdig framework dat hoogwaardige vectoranimaties genereert op basis van multimodale instructies. Voor flexibele controle over beweging en visuele inhoud richten we ons op Lottie, een lichtgewicht JSON-formaat voor de representatie van zowel vormen als animatiegedrag. De ruwe Lottie JSON-bestanden bevatten echter uitgebreide invariante structurele metadata en opmaaktokens, wat aanzienlijke uitdagingen vormt voor het aanleren van vectoranimatiegeneratie. Daarom introduceren we een goed ontworpen Lottie-tokenizer die JSON-bestanden omzet in gestructureerde sequenties van commando's en parameters die vormen, animatiefuncties en controleparameters vertegenwoordigen. Deze tokenizer stelt ons in staat OmniLottie te bouwen op vooraf getrainde vision-language modellen, om multimodale verweven instructies te volgen en hoogwaardige vectoranimaties te genereren. Om onderzoek naar vectoranimatiegeneratie verder te bevorderen, stellen we MMLottie-2M samen, een grootschalige dataset van professioneel ontworpen vectoranimaties gekoppeld aan tekstuele en visuele annotaties. Met uitgebreide experimenten valideren we dat OmniLottie levendige en semantisch uitgelijnde vectoranimaties kan produceren die nauw aansluiten bij multimodale menselijke instructies.

English

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

OmniLottie: Vectoranimaties genereren via geparametriseerde Lottie-tokens

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Samenvatting

Support