OmniLottie: 파라미터화된 Lottie 토큰을 통한 벡터 애니메이션 생성

초록

OmniLottie는 다중 모드(multi-modal) 지시어로부터 고품질 벡터 애니메이션을 생성하는 다목적 프레임워크입니다. 유연한 모션 및 시각적 콘텐츠 제어를 위해 우리는 도형과 애니메이션 동작 표현 모두를 위한 경량 JSON 형식인 Lottie에 주목합니다. 그러나 원시 Lottie JSON 파일에는 방대한 불변 구조 메타데이터와 형식화 토큰이 포함되어 있어 벡터 애니메이션 생성 학습에 상당한 어려움을 줍니다. 따라서 우리는 JSON 파일을 도형, 애니메이션 함수, 제어 매개변수를 나타내는 구조화된 명령어와 매개변수의 시퀀스로 변환하는 잘 설계된 Lottie 토크나이저(tokenizer)를 소개합니다. 이러한 토크나이저는 사전 학습된 비전-언어 모델 기반으로 OmniLottie를 구축하여 다중 모드 교차 지시어를 따르고 고품질 벡터 애니메이션을 생성할 수 있게 합니다. 벡터 애니메이션 생성 연구를 더욱 발전시키기 위해, 우리는 전문적으로 디자인된 벡터 애니메이션과 텍스트 및 시각적 주석이 짝을 이루는 대규모 데이터셋인 MMLottie-2M을 구축했습니다. 광범위한 실험을 통해 OmniLottie가 다중 모드 인간 지시어에 밀접하게 부합하며 생생하고 의미적으로 정렬된 벡터 애니메이션을 생성할 수 있음을 입증합니다.

English

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

OmniLottie: 파라미터화된 Lottie 토큰을 통한 벡터 애니메이션 생성

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

초록

Support