ChatPaper.aiChatPaper

ConsistentID:多模態細粒度身份保持的肖像生成

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

April 25, 2024
作者: Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang
cs.AI

摘要

擴散式技術在個性化和定制化臉部生成方面取得了顯著進展。然而,現有方法在實現高保真度和詳細身份(ID)一致性方面面臨挑戰,主要是由於對面部區域的精細控制不足以及缺乏全面的身份保留策略,未能充分考慮微妙的面部細節和整體面部。為了解決這些限制,我們引入了ConsistentID,這是一種創新方法,專為在細粒度多模態面部提示下生成保持多樣性身份的肖像而設計,僅利用單張參考圖像。ConsistentID包括兩個關鍵組件:一個多模態面部提示生成器,結合面部特徵、相應的面部描述和整體面部背景,以增強面部細節的精確性,以及通過面部關注定位策略優化的身份保留網絡,旨在保持面部區域的ID一致性。這些組件共同通過從面部區域引入細粒度多模態ID信息,顯著提高了ID保留的準確性。為了促進ConsistentID的訓練,我們提出了一個細粒度肖像數據集FGID,其中包含超過50萬張面部圖像,比現有的公共面部數據集(如LAION-Face、CelebA、FFHQ和SFHQ)具有更大的多樣性和全面性。實驗結果證實,我們的ConsistentID在個性化臉部生成方面實現了卓越的精確性和多樣性,在MyStyle數據集中超越了現有方法。此外,儘管ConsistentID引入了更多多模態ID信息,但在生成過程中仍保持快速的推理速度。
English
Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

Summary

AI-Generated Summary

PDF201December 15, 2024