ChatPaper.aiChatPaper

PhotoMaker:透過堆疊式身份嵌入自定逼真人像

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

December 7, 2023
作者: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan
cs.AI

摘要

最近在文本到圖像生成方面取得了顯著進展,成功合成了栩栩如生的人類照片,並根據給定的文本提示進行條件設置。然而,現有的個性化生成方法無法同時滿足高效率、有前景的身份(ID)保真度和靈活的文本可控性要求。在這項工作中,我們介紹了PhotoMaker,一種高效的個性化文本到圖像生成方法,主要將任意數量的輸入ID圖像編碼為一個堆疊的ID嵌入,以保留ID信息。這種嵌入作為統一的ID表示,不僅可以全面地封裝相同輸入ID的特徵,還可以容納不同ID的特徵以進行後續整合。這為更引人入勝且實際有價值的應用鋪平了道路。此外,為了推動我們的PhotoMaker的訓練,我們提出了一個以ID為導向的數據構建流程來組裝訓練數據。在通過提出的流程構建的數據集的滋養下,我們的PhotoMaker展示了比基於測試時間微調的方法更好的ID保留能力,同時提供了顯著的速度改進、高質量的生成結果、強大的泛化能力和廣泛的應用範圍。我們的項目頁面位於https://photo-maker.github.io/。
English
Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/
PDF6216December 15, 2024