ChatPaper.aiChatPaper

PhotoMaker:通过堆叠式ID嵌入定制逼真人像照片

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

December 7, 2023
作者: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan
cs.AI

摘要

最近在文本到图像生成领域取得了显著进展,成功合成了在给定文本提示的条件下具有逼真人类照片。然而,现有的个性化生成方法无法同时满足高效率、有前景的身份(ID)保真度和灵活的文本可控性要求。在这项工作中,我们介绍了PhotoMaker,一种高效的个性化文本到图像生成方法,主要将任意数量的输入ID图像编码为一组ID嵌入以保留ID信息。这种嵌入作为统一的ID表示,不仅可以全面地封装相同输入ID的特征,还可以容纳不同ID的特征以供后续整合。这为更有趣且实际有价值的应用铺平了道路。此外,为了推动我们的PhotoMaker的训练,我们提出了一个面向ID的数据构建流水线来组装训练数据。在通过提议的流水线构建的数据集的滋养下,我们的PhotoMaker展示了比基于测试时微调的方法更好的ID保留能力,同时提供了显著的速度改进、高质量的生成结果、强大的泛化能力和广泛的应用范围。我们的项目页面位于https://photo-maker.github.io/。
English
Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/
PDF6216December 15, 2024