OmniLottie: Generación de Animaciones Vectoriales mediante Tokens Lottie Parametrizados

Resumen

OmniLottie es un marco versátil que genera animaciones vectoriales de alta calidad a partir de instrucciones multimodales. Para un control flexible del movimiento y el contenido visual, nos centramos en Lottie, un formato JSON ligero para representar tanto formas como comportamientos de animación. Sin embargo, los archivos JSON de Lottie en bruto contienen extensos metadatos estructurales invariantes y tokens de formato, lo que plantea desafíos significativos para el aprendizaje de la generación de animaciones vectoriales. Por lo tanto, introducimos un tokenizador de Lottie bien diseñado que transforma los archivos JSON en secuencias estructuradas de comandos y parámetros que representan formas, funciones de animación y parámetros de control. Dicho tokenizador nos permite construir OmniLottie sobre modelos de lenguaje visual preentrenados para seguir instrucciones intercaladas multimodales y generar animaciones vectoriales de alta calidad. Para impulsar aún más la investigación en generación de animaciones vectoriales, recopilamos MMLottie-2M, un conjunto de datos a gran escala de animaciones vectoriales diseñadas profesionalmente y acompañadas de anotaciones textuales y visuales. Mediante extensos experimentos, validamos que OmniLottie puede producir animaciones vectoriales vívidas y semánticamente alineadas que se adhieren estrechamente a las instrucciones humanas multimodales.

English

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

OmniLottie: Generación de Animaciones Vectoriales mediante Tokens Lottie Parametrizados

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Resumen

Support