PIA:通过即插即用模块在文图模型中实现个性化图像动画化
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
December 21, 2023
作者: Yiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen
cs.AI
摘要
最近个性化文本到图像(T2I)模型的最新进展已经彻底改变了内容创作,使非专家能够生成具有独特风格的令人惊叹的图像。虽然有前景,但通过文本为这些个性化图像添加逼真的动作在保留独特风格、高保真细节和实现文本控制的运动方面存在重大挑战。在本文中,我们提出了PIA,一种个性化图像动画生成器,在与条件图像对齐、通过文本实现运动控制以及与各种个性化T2I模型兼容而无需特定调整方面表现出色。为实现这些目标,PIA基于一个基础T2I模型,配备经过良好训练的时间对齐层,使得任何个性化T2I模型都能无缝转换为图像动画模型。PIA的一个关键组成部分是引入条件模块,该模块利用条件帧和帧间关联作为输入,通过关联提示指导个别帧合成中的外观信息传递,从而在潜在空间中减轻外观相关图像对齐的挑战,并更加专注于与运动相关指导的对齐。
English
Recent advancements in personalized text-to-image (T2I) models have
revolutionized content creation, empowering non-experts to generate stunning
images with unique styles. While promising, adding realistic motions into these
personalized images by text poses significant challenges in preserving distinct
styles, high-fidelity details, and achieving motion controllability by text. In
this paper, we present PIA, a Personalized Image Animator that excels in
aligning with condition images, achieving motion controllability by text, and
the compatibility with various personalized T2I models without specific tuning.
To achieve these goals, PIA builds upon a base T2I model with well-trained
temporal alignment layers, allowing for the seamless transformation of any
personalized T2I model into an image animation model. A key component of PIA is
the introduction of the condition module, which utilizes the condition frame
and inter-frame affinity as input to transfer appearance information guided by
the affinity hint for individual frame synthesis in the latent space. This
design mitigates the challenges of appearance-related image alignment within
and allows for a stronger focus on aligning with motion-related guidance.