用於預訓練擴散模型的臉部適配器，具有細粒度ID和屬性控制

摘要

目前的人臉再現和交換方法主要依賴於生成對抗網絡（GAN）框架，但最近的焦點已轉向預訓練擴散模型，因為它們具有卓越的生成能力。然而，訓練這些模型需要大量資源，且結果尚未達到令人滿意的性能水平。為了應對這個問題，我們引入了Face-Adapter，一個專為預訓練擴散模型設計的高精度和高保真度人臉編輯適配器。我們觀察到，人臉再現/交換任務本質上涉及目標結構、ID和屬性的組合。我們的目標是充分解耦這些因素的控制，以實現一個模型中的兩個任務。具體來說，我們的方法包括：1）提供精確標記和背景的空間條件生成器；2）通過變壓器解碼器將人臉嵌入轉換為文本空間的即插即用身份編碼器；3）集成空間條件和詳細屬性的屬性控制器。Face-Adapter在運動控制精度、ID保留能力和生成質量方面實現了與完全微調的人臉再現/交換模型相當甚至更優越的性能。此外，Face-Adapter與各種StableDiffusion模型無縫集成。

English

Current face reenactment and swapping methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed for high-precision and high-fidelity face editing for pre-trained diffusion models. We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute. We aim to sufficiently decouple the control of these factors to achieve both tasks in one model. Specifically, our method contains: 1) A Spatial Condition Generator that provides precise landmarks and background; 2) A Plug-and-play Identity Encoder that transfers face embeddings to the text space by a transformer decoder. 3) An Attribute Controller that integrates spatial conditions and detailed attributes. Face-Adapter achieves comparable or even superior performance in terms of motion control precision, ID retention capability, and generation quality compared to fully fine-tuned face reenactment/swapping models. Additionally, Face-Adapter seamlessly integrates with various StableDiffusion models.

用於預訓練擴散模型的臉部適配器，具有細粒度ID和屬性控制

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

摘要

Support