OmniLottie：基于参数化Lottie令牌的矢量动画生成

摘要

OmniLottie是一种多功能框架，能够根据多模态指令生成高质量矢量动画。为实现灵活的运动与视觉内容控制，我们聚焦于Lottie——一种轻量级JSON格式，可同时表征图形与动画行为。然而原始Lottie JSON文件包含大量不变的结构元数据和格式标记，为矢量动画生成学习带来巨大挑战。为此，我们设计了精密的Lottie标记化器，将JSON文件转换为由命令与参数构成的结构化序列，这些序列分别表征图形、动画功能及控制参数。该标记化器使我们能够基于预训练视觉语言模型构建OmniLottie，使其遵循多模态交错指令并生成高质量矢量动画。为推进矢量动画生成研究，我们构建了MMLottie-2M大规模数据集，其中包含专业设计的矢量动画及其对应的文本与视觉标注。通过大量实验验证，OmniLottie能够生成生动且语义对齐的矢量动画，精准遵循多模态人类指令。

English

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.