InstantFamily: ゼロショット多ID画像生成のためのマスク付きアテンション

要旨

パーソナライズド画像生成の分野において、概念を保持した画像を作成する能力は大幅に向上しています。複数の概念を自然に統合し、まとまりがあり視覚的に魅力的な構図を持つ画像を作成することは、確かに難しい課題です。本論文では、「InstantFamily」というアプローチを紹介します。この手法は、新しいマスク付きクロスアテンションメカニズムとマルチモーダル埋め込みスタックを採用し、ゼロショットでの複数ID画像生成を実現します。私たちの手法は、テキスト条件と統合された事前学習済み顔認識モデルから得られるグローバルおよびローカルな特徴を活用することで、IDを効果的に保持します。さらに、マスク付きクロスアテンションメカニズムにより、生成された画像における複数IDと構図の正確な制御が可能です。InstantFamilyの有効性を、複数IDを持つ画像生成において優位性を示す実験を通じて実証し、既知の複数ID生成の問題を解決します。また、私たちのモデルは、単一IDおよび複数IDの保持において、最先端の性能を達成します。さらに、このモデルは、当初のトレーニング時よりも多くのID保持において、顕著なスケーラビリティを示します。

English

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

InstantFamily: ゼロショット多ID画像生成のためのマスク付きアテンション

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

要旨

Support