设想自己:无调参个性化图像生成
Imagine yourself: Tuning-Free Personalized Image Generation
September 20, 2024
作者: Zecheng He, Bo Sun, Felix Juefei-Xu, Haoyu Ma, Ankit Ramchandani, Vincent Cheung, Siddharth Shah, Anmol Kalia, Harihar Subramanyam, Alireza Zareian, Li Chen, Ankit Jain, Ning Zhang, Peizhao Zhang, Roshan Sumbaly, Peter Vajda, Animesh Sinha
cs.AI
摘要
扩散模型在各种图像对图像任务中展示了显著的效果。在这项研究中,我们介绍了Imagine yourself,这是一种专为个性化图像生成设计的最先进模型。与传统的基于调整的个性化技术不同,Imagine yourself作为一种无需调整的模型运行,使所有用户能够利用共享框架而无需个性化调整。此外,先前的工作在平衡身份保留、遵循复杂提示和保持良好视觉质量方面遇到了挑战,导致模型具有强烈的参照图像复制粘贴效应。因此,它们几乎无法生成遵循需要对参考图像进行重大更改的提示的图像,例如更改面部表情、头部和身体姿势,生成图像的多样性较低。为了解决这些限制,我们提出的方法引入了1)一种新的合成配对数据生成机制以鼓励图像多样性,2)一个具有三个文本编码器和一个完全可训练的视觉编码器的全并行注意力架构以提高文本忠实度,以及3)一种新颖的由粗到细的多阶段微调方法,逐渐推动视觉质量的边界。我们的研究表明,Imagine yourself超越了最先进的个性化模型,在身份保留、视觉质量和文本对齐方面展现出卓越能力。该模型为各种个性化应用奠定了坚实基础。人类评估结果验证了该模型在所有方面(身份保留、文本忠实度和视觉吸引力)上相对于先前的个性化模型具有最先进的优势。
English
Diffusion models have demonstrated remarkable efficacy across various
image-to-image tasks. In this research, we introduce Imagine yourself, a
state-of-the-art model designed for personalized image generation. Unlike
conventional tuning-based personalization techniques, Imagine yourself operates
as a tuning-free model, enabling all users to leverage a shared framework
without individualized adjustments. Moreover, previous work met challenges
balancing identity preservation, following complex prompts and preserving good
visual quality, resulting in models having strong copy-paste effect of the
reference images. Thus, they can hardly generate images following prompts that
require significant changes to the reference image, \eg, changing facial
expression, head and body poses, and the diversity of the generated images is
low. To address these limitations, our proposed method introduces 1) a new
synthetic paired data generation mechanism to encourage image diversity, 2) a
fully parallel attention architecture with three text encoders and a fully
trainable vision encoder to improve the text faithfulness, and 3) a novel
coarse-to-fine multi-stage finetuning methodology that gradually pushes the
boundary of visual quality. Our study demonstrates that Imagine yourself
surpasses the state-of-the-art personalization model, exhibiting superior
capabilities in identity preservation, visual quality, and text alignment. This
model establishes a robust foundation for various personalization applications.
Human evaluation results validate the model's SOTA superiority across all
aspects (identity preservation, text faithfulness, and visual appeal) compared
to the previous personalization models.Summary
AI-Generated Summary