用于预训练扩散模型的面部适配器,具有细粒度ID和属性控制
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
May 21, 2024
作者: Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu
cs.AI
摘要
当前的人脸重现和交换方法主要依赖于生成对抗网络(GAN)框架,但最近的关注重点已转向预训练扩散模型,因为它们具有更优越的生成能力。然而,训练这些模型需要大量资源,并且结果尚未达到令人满意的性能水平。为了解决这个问题,我们引入了Face-Adapter,一种专为预训练扩散模型设计的高精度和高保真度人脸编辑适配器。我们观察到,人脸重现/交换任务本质上涉及目标结构、ID和属性的组合。我们的目标是充分解耦这些因素的控制,以实现一个模型中的两个任务。具体而言,我们的方法包括:1)空间条件生成器,提供精确的标志点和背景;2)即插即用的身份编码器,通过变压器解码器将人脸嵌入转移到文本空间;3)属性控制器,整合空间条件和详细属性。Face-Adapter在运动控制精度、ID保留能力和生成质量方面实现了与完全微调的人脸重现/交换模型相媲美甚至更优越的性能。此外,Face-Adapter与各种StableDiffusion模型无缝集成。
English
Current face reenactment and swapping methods mainly rely on GAN frameworks,
but recent focus has shifted to pre-trained diffusion models for their superior
generation capabilities. However, training these models is resource-intensive,
and the results have not yet achieved satisfactory performance levels. To
address this issue, we introduce Face-Adapter, an efficient and effective
adapter designed for high-precision and high-fidelity face editing for
pre-trained diffusion models. We observe that both face reenactment/swapping
tasks essentially involve combinations of target structure, ID and attribute.
We aim to sufficiently decouple the control of these factors to achieve both
tasks in one model. Specifically, our method contains: 1) A Spatial Condition
Generator that provides precise landmarks and background; 2) A Plug-and-play
Identity Encoder that transfers face embeddings to the text space by a
transformer decoder. 3) An Attribute Controller that integrates spatial
conditions and detailed attributes. Face-Adapter achieves comparable or even
superior performance in terms of motion control precision, ID retention
capability, and generation quality compared to fully fine-tuned face
reenactment/swapping models. Additionally, Face-Adapter seamlessly integrates
with various StableDiffusion models.