AniClipart：具有文本到视频先验知识的剪贴画动画

摘要

Clipart，一种预先制作的图形艺术形式，为说明视觉内容提供了便捷高效的方式。将静态clipart图像转换为动态序列的传统工作流程繁琐耗时，涉及多个复杂步骤，如装配、关键动画和中间动画。最近文本到视频生成技术的进展在解决这一问题方面具有巨大潜力。然而，直接应用文本到视频生成模型往往难以保留clipart图像的视觉特征或生成卡通风格动作，导致动画效果不佳。本文介绍了AniClipart，这是一个系统，通过文本到视频先验指导，将静态clipart图像转换为高质量动态序列。为了生成卡通风格和流畅动作，我们首先定义了clipart图像关键点上的贝塞尔曲线作为一种运动正则化形式。然后通过优化Video Score Distillation Sampling (VSDS)损失来将关键点的运动轨迹与提供的文本提示对齐，该损失编码了预训练文本到视频扩散模型中自然运动的充分知识。通过可微的尽可能刚性形变算法，我们的方法可以端到端优化，同时保持形变刚度。实验结果表明，所提出的AniClipart在文本视频对齐、视觉特征保留和动作一致性方面始终优于现有的图像到视频生成模型。此外，我们展示了AniClipart的多功能性，通过调整以生成更广泛的动画格式，如分层动画，允许拓扑变化。

English

Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nevertheless, direct application of text-to-video generation models often struggles to retain the visual identity of clipart images or generate cartoon-style motions, resulting in unsatisfactory animation outcomes. In this paper, we introduce AniClipart, a system that transforms static clipart images into high-quality motion sequences guided by text-to-video priors. To generate cartoon-style and smooth motion, we first define B\'{e}zier curves over keypoints of the clipart image as a form of motion regularization. We then align the motion trajectories of the keypoints with the provided text prompt by optimizing the Video Score Distillation Sampling (VSDS) loss, which encodes adequate knowledge of natural motion within a pretrained text-to-video diffusion model. With a differentiable As-Rigid-As-Possible shape deformation algorithm, our method can be end-to-end optimized while maintaining deformation rigidity. Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models, in terms of text-video alignment, visual identity preservation, and motion consistency. Furthermore, we showcase the versatility of AniClipart by adapting it to generate a broader array of animation formats, such as layered animation, which allows topological changes.

AniClipart：具有文本到视频先验知识的剪贴画动画

AniClipart: Clipart Animation with Text-to-Video Priors

摘要

Support