ID-Aligner: 報酬フィードバック学習によるアイデンティティ保持型テキスト画像生成の強化

要旨

拡散モデルの急速な発展により、多様な応用が引き起こされている。特に、アイデンティティを保持したテキストから画像への生成（ID-T2I）は、AIポートレートや広告などの幅広い応用シナリオから大きな注目を集めている。既存のID-T2I手法は印象的な結果を示しているものの、いくつかの重要な課題が残されている：（1）参照ポートレートのアイデンティティ特性を正確に維持することが難しい、（2）生成された画像が、特にアイデンティティ保持を強制する際に美的魅力に欠ける、（3）LoRAベースとAdapterベースの手法を同時に互換させることができないという制限がある。これらの課題に対処するため、我々はID-T2Iの性能を向上させるための汎用的なフィードバック学習フレームワークであるID-Alignerを提案する。アイデンティティ特徴の喪失を解決するために、顔検出および認識モデルからのフィードバックを活用したアイデンティティ一貫性報酬ファインチューニングを導入し、生成されたアイデンティティの保持を改善する。さらに、人間が注釈を付けた選好データと自動構築されたキャラクター構造生成に関するフィードバックを活用したアイデンティティ美的報酬ファインチューニングを提案し、美的な調整信号を提供する。その汎用的なフィードバックファインチューニングフレームワークのおかげで、我々の手法はLoRAモデルとAdapterモデルの両方に容易に適用でき、一貫した性能向上を達成する。SD1.5およびSDXL拡散モデルでの広範な実験により、我々のアプローチの有効性が検証された。プロジェクトページ：\url{https://idaligner.github.io/}

English

The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present ID-Aligner, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. Project Page: \url{https://idaligner.github.io/}

ID-Aligner: 報酬フィードバック学習によるアイデンティティ保持型テキスト画像生成の強化

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

要旨

Support