即時家庭:遮罩式注意力用於零樣本多ID圖像生成
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
April 30, 2024
作者: Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek
cs.AI
摘要
在個性化圖像生成領域,保留概念的圖像創建能力顯著提高。創建一幅自然地融合多個概念、構圖統一且視覺上吸引人的圖像確實具有挑戰性。本文介紹了一種名為「InstantFamily」的方法,該方法採用了一種新穎的遮罩交叉注意機制和多模態嵌入堆棧,以實現零樣本多ID圖像生成。我們的方法有效地保留了ID,因為它利用了來自預訓練人臉識別模型的全局和局部特徵,並與文本條件相結合。此外,我們的遮罩交叉注意機制實現了對生成圖像中多個ID和構圖的精確控制。我們通過實驗展示了InstantFamily的有效性,證明了它在生成具有多個ID的圖像方面的優越性,同時解決了眾所周知的多ID生成問題。此外,我們的模型在單個ID和多個ID保留方面均實現了最先進的性能。此外,我們的模型展現出卓越的可擴展性,可以保留比其最初訓練時更多的ID。
English
In the field of personalized image generation, the ability to create images
preserving concepts has significantly improved. Creating an image that
naturally integrates multiple concepts in a cohesive and visually appealing
composition can indeed be challenging. This paper introduces "InstantFamily,"
an approach that employs a novel masked cross-attention mechanism and a
multimodal embedding stack to achieve zero-shot multi-ID image generation. Our
method effectively preserves ID as it utilizes global and local features from a
pre-trained face recognition model integrated with text conditions.
Additionally, our masked cross-attention mechanism enables the precise control
of multi-ID and composition in the generated images. We demonstrate the
effectiveness of InstantFamily through experiments showing its dominance in
generating images with multi-ID, while resolving well-known multi-ID generation
problems. Additionally, our model achieves state-of-the-art performance in both
single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable
scalability with a greater number of ID preservation than it was originally
trained with.Summary
AI-Generated Summary