ID-Aligner：透過獎勵反饋學習增強保護身份的文本到圖像生成

摘要

擴散模型的快速發展引發了各種應用。特別是保持身份的文本到圖像生成（ID-T2I）因其廣泛的應用場景，如人工智慧肖像和廣告，而受到重視。儘管現有的ID-T2I方法展示了令人印象深刻的結果，但仍存在幾個關鍵挑戰：（1）很難準確保持參考肖像的身份特徵，（2）生成的圖像缺乏美感，尤其在強調身份保留時，以及（3）存在無法同時兼容LoRA和Adapter方法的限制。為了應對這些問題，我們提出了ID-Aligner，一個通用的反饋學習框架，以增強ID-T2I的性能。為了解決丟失的身份特徵，我們引入身份一致性獎勵微調，利用來自人臉檢測和識別模型的反饋來改善生成的身份保留。此外，我們提出了身份美感獎勵微調，利用來自人類注釋的偏好數據和自動構建的角色結構生成反饋，提供美感微調信號。由於其通用的反饋微調框架，我們的方法可以輕鬆應用於LoRA和Adapter模型，實現一致的性能增益。對SD1.5和SDXL擴散模型的大量實驗驗證了我們方法的有效性。項目頁面：\url{https://idaligner.github.io/}

English

The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present ID-Aligner, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. Project Page: \url{https://idaligner.github.io/}

ID-Aligner：透過獎勵反饋學習增強保護身份的文本到圖像生成

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

摘要

Support