一致性ID:多模态细粒度身份保留的肖像生成
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
April 25, 2024
作者: Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang
cs.AI
摘要
基于扩散的技术取得了显著进展,特别是在个性化和定制化面部生成方面。然而,现有方法在实现高保真度和详细身份(ID)一致性方面面临挑战,主要是由于对面部区域缺乏细粒度控制以及缺乏全面的身份保留策略,未能充分考虑复杂的面部细节和整体面部。为了解决这些限制,我们引入了ConsistentID,这是一种创新方法,专为在细粒度多模态面部提示下生成保持多样性身份的肖像而设计,仅利用单个参考图像。ConsistentID包括两个关键组件:一个多模态面部提示生成器,结合面部特征、相应的面部描述和整体面部背景,以增强面部细节的精确性;以及一个经过面部关注定位策略优化的身份保留网络,旨在保持面部区域的ID一致性。这些组件共同通过从面部区域引入细粒度多模态ID信息,显著提高了ID保留的准确性。为了便于ConsistentID的训练,我们提出了一个细粒度肖像数据集FGID,其中包含超过50万张面部图像,比现有公共面部数据集(如LAION-Face、CelebA、FFHQ和SFHQ)具有更大的多样性和全面性。实验结果证实,我们的ConsistentID在个性化面部生成方面实现了出色的精度和多样性,在MyStyle数据集中超过了现有方法。此外,虽然ConsistentID引入了更多的多模态ID信息,但在生成过程中仍保持了快速的推理速度。
English
Diffusion-based technologies have made significant strides, particularly in
personalized and customized facialgeneration. However, existing methods face
challenges in achieving high-fidelity and detailed identity (ID)consistency,
primarily due to insufficient fine-grained control over facial areas and the
lack of a comprehensive strategy for ID preservation by fully considering
intricate facial details and the overall face. To address these limitations, we
introduce ConsistentID, an innovative method crafted for
diverseidentity-preserving portrait generation under fine-grained multimodal
facial prompts, utilizing only a single reference image. ConsistentID comprises
two key components: a multimodal facial prompt generator that combines facial
features, corresponding facial descriptions and the overall facial context to
enhance precision in facial details, and an ID-preservation network optimized
through the facial attention localization strategy, aimed at preserving ID
consistency in facial regions. Together, these components significantly enhance
the accuracy of ID preservation by introducing fine-grained multimodal ID
information from facial regions. To facilitate training of ConsistentID, we
present a fine-grained portrait dataset, FGID, with over 500,000 facial images,
offering greater diversity and comprehensiveness than existing public facial
datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results
substantiate that our ConsistentID achieves exceptional precision and diversity
in personalized facial generation, surpassing existing methods in the MyStyle
dataset. Furthermore, while ConsistentID introduces more multimodal ID
information, it maintains a fast inference speed during generation.Summary
AI-Generated Summary