Cinemo:使用运动扩散模型实现一致且可控的图像动画
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
July 22, 2024
作者: Xin Ma, Yaohui Wang, Gengyu Jia, Xinyuan Chen, Yuan-Fang Li, Cunjian Chen, Yu Qiao
cs.AI
摘要
由于强大的生成能力,扩散模型在图像动画方面取得了巨大进展。然而,要在动画视频叙事中保持与输入静态图像的详细信息(例如风格、背景和对象)的时空一致性,并确保受文本提示指导的动画视频流畅性仍然具有挑战性。本文介绍了Cinemo,这是一种新颖的图像动画方法,旨在实现更好的运动可控性,以及更强的时空一致性和平滑性。总体上,我们提出了Cinemo在训练和推断阶段实现目标的三种有效策略。在训练阶段,Cinemo专注于学习运动残差的分布,而不是通过运动扩散模型直接预测随后的运动。此外,提出了一种基于结构相似性指数的策略,使Cinemo能够更好地控制运动强度。在推断阶段,引入了基于离散余弦变换的噪声细化技术,以减轻突然的运动变化。这三种策略使Cinemo能够产生高度一致、平滑和可控的运动结果。与先前的方法相比,Cinemo提供了更简单和更精确的用户可控性。针对几种最先进的方法进行了大量实验,包括商业工具和研究方法,跨多个指标展示了我们提出的方法的有效性和优越性。
English
Diffusion models have achieved great progress in image animation due to
powerful generative capabilities. However, maintaining spatio-temporal
consistency with detailed information from the input static image over time
(e.g., style, background, and object of the input static image) and ensuring
smoothness in animated video narratives guided by textual prompts still remains
challenging. In this paper, we introduce Cinemo, a novel image animation
approach towards achieving better motion controllability, as well as stronger
temporal consistency and smoothness. In general, we propose three effective
strategies at the training and inference stages of Cinemo to accomplish our
goal. At the training stage, Cinemo focuses on learning the distribution of
motion residuals, rather than directly predicting subsequent via a motion
diffusion model. Additionally, a structural similarity index-based strategy is
proposed to enable Cinemo to have better controllability of motion intensity.
At the inference stage, a noise refinement technique based on discrete cosine
transformation is introduced to mitigate sudden motion changes. Such three
strategies enable Cinemo to produce highly consistent, smooth, and
motion-controllable results. Compared to previous methods, Cinemo offers
simpler and more precise user controllability. Extensive experiments against
several state-of-the-art methods, including both commercial tools and research
approaches, across multiple metrics, demonstrate the effectiveness and
superiority of our proposed approach.Summary
AI-Generated Summary