即时家庭：用于零样本多ID图像生成的遮罩注意力

摘要

在个性化图像生成领域，保留概念的图像创建能力显著提高。创作一个自然地融合多个概念、构图连贯且视觉吸引力强的图像确实具有挑战性。本文介绍了一种名为“InstantFamily”的方法，该方法采用了一种新颖的遮罩交叉注意力机制和多模态嵌入堆栈，实现了零样本多身份图像生成。我们的方法通过利用预训练人脸识别模型的全局和局部特征结合文本条件，有效地保留了身份信息。此外，我们的遮罩交叉注意力机制实现了对生成图像中多个身份和构图的精确控制。我们通过实验证明了InstantFamily的有效性，展示了它在生成具有多个身份的图像方面的优势，同时解决了众所周知的多身份生成问题。此外，我们的模型在单一身份和多身份保留方面实现了最先进的性能。此外，我们的模型表现出出色的可扩展性，可以保留比其原始训练时更多的身份信息。

English

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

即时家庭：用于零样本多ID图像生成的遮罩注意力

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

摘要

Support