设想自己：无调参个性化图像生成

摘要

扩散模型在各种图像对图像任务中展示了显著的效果。在这项研究中，我们介绍了Imagine yourself，这是一种专为个性化图像生成设计的最先进模型。与传统的基于调整的个性化技术不同，Imagine yourself作为一种无需调整的模型运行，使所有用户能够利用共享框架而无需个性化调整。此外，先前的工作在平衡身份保留、遵循复杂提示和保持良好视觉质量方面遇到了挑战，导致模型具有强烈的参照图像复制粘贴效应。因此，它们几乎无法生成遵循需要对参考图像进行重大更改的提示的图像，例如更改面部表情、头部和身体姿势，生成图像的多样性较低。为了解决这些限制，我们提出的方法引入了1）一种新的合成配对数据生成机制以鼓励图像多样性，2）一个具有三个文本编码器和一个完全可训练的视觉编码器的全并行注意力架构以提高文本忠实度，以及3）一种新颖的由粗到细的多阶段微调方法，逐渐推动视觉质量的边界。我们的研究表明，Imagine yourself超越了最先进的个性化模型，在身份保留、视觉质量和文本对齐方面展现出卓越能力。该模型为各种个性化应用奠定了坚实基础。人类评估结果验证了该模型在所有方面（身份保留、文本忠实度和视觉吸引力）上相对于先前的个性化模型具有最先进的优势。

English

Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based personalization techniques, Imagine yourself operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments. Moreover, previous work met challenges balancing identity preservation, following complex prompts and preserving good visual quality, resulting in models having strong copy-paste effect of the reference images. Thus, they can hardly generate images following prompts that require significant changes to the reference image, \eg, changing facial expression, head and body poses, and the diversity of the generated images is low. To address these limitations, our proposed method introduces 1) a new synthetic paired data generation mechanism to encourage image diversity, 2) a fully parallel attention architecture with three text encoders and a fully trainable vision encoder to improve the text faithfulness, and 3) a novel coarse-to-fine multi-stage finetuning methodology that gradually pushes the boundary of visual quality. Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment. This model establishes a robust foundation for various personalization applications. Human evaluation results validate the model's SOTA superiority across all aspects (identity preservation, text faithfulness, and visual appeal) compared to the previous personalization models.

设想自己：无调参个性化图像生成

Imagine yourself: Tuning-Free Personalized Image Generation

摘要

Support