ChatPaper.aiChatPaper

AniClipart:具有文本到視頻先驗知識的剪貼畫動畫

AniClipart: Clipart Animation with Text-to-Video Priors

April 18, 2024
作者: Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao
cs.AI

摘要

Clipart是一種預製的平面藝術形式,提供了一種方便高效的方式來說明視覺內容。將靜態clipart圖像轉換為動態序列的傳統工作流程繁瑣耗時,涉及諸多複雜步驟,如裝配、關鍵動畫和中間幀製作。最近文本生成視頻技術的進步在解決這個問題上具有巨大潛力。然而,直接應用文本生成視頻模型往往難以保留clipart圖像的視覺特徵或生成卡通風格動畫,導致動畫效果不佳。本文介紹了AniClipart,一個能夠根據文本生成高質量動態序列的系統。為了生成卡通風格和流暢動畫,我們首先將clipart圖像的關鍵點定義為B\'{e}zier曲線,作為運動正則化的一種形式。然後通過優化Video Score Distillation Sampling (VSDS)損失來對齊關鍵點的運動軌跡與提供的文本提示,該損失編碼了預訓練文本生成視頻擴散模型中對自然運動的充分知識。通過可微的盡可能保持剛性的形變算法,我們的方法可以進行端到端的優化,同時保持變形的剛性。實驗結果表明,所提出的AniClipart在文本視頻對齊、視覺特徵保留和運動一致性方面始終優於現有的圖像生成視頻模型。此外,我們展示了AniClipart的多功能性,通過將其適應到生成更廣泛的動畫格式,如分層動畫,從而實現拓撲變化。
English
Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nevertheless, direct application of text-to-video generation models often struggles to retain the visual identity of clipart images or generate cartoon-style motions, resulting in unsatisfactory animation outcomes. In this paper, we introduce AniClipart, a system that transforms static clipart images into high-quality motion sequences guided by text-to-video priors. To generate cartoon-style and smooth motion, we first define B\'{e}zier curves over keypoints of the clipart image as a form of motion regularization. We then align the motion trajectories of the keypoints with the provided text prompt by optimizing the Video Score Distillation Sampling (VSDS) loss, which encodes adequate knowledge of natural motion within a pretrained text-to-video diffusion model. With a differentiable As-Rigid-As-Possible shape deformation algorithm, our method can be end-to-end optimized while maintaining deformation rigidity. Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models, in terms of text-video alignment, visual identity preservation, and motion consistency. Furthermore, we showcase the versatility of AniClipart by adapting it to generate a broader array of animation formats, such as layered animation, which allows topological changes.

Summary

AI-Generated Summary

PDF131December 15, 2024