被选者:文本到图像扩散模型中的一致性角色
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
November 16, 2023
作者: Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski
cs.AI
摘要
最近在文本到图像生成模型方面取得的进展为视觉创造力开辟了巨大潜力。然而,这些模型在生成一致的角色方面存在困难,这是许多现实世界应用的关键方面,如故事可视化、游戏开发资产设计、广告等。当前方法通常依赖于目标角色的多个现有图像或涉及劳动密集型的手动过程。在这项工作中,我们提出了一个完全自动化的解决方案,用于一致角色生成,其唯一输入是文本提示。我们引入了一个迭代过程,在每个阶段,识别出一组连贯的图像,共享相似的身份,并从这组图像中提取出更一致的身份。我们的定量分析表明,与基准方法相比,我们的方法在提示对齐和身份一致性之间取得了更好的平衡,这些发现得到了用户研究的支持。最后,我们展示了我们方法的几个实际应用。项目页面位于https://omriavrahami.com/the-chosen-one
English
Recent advances in text-to-image generation models have unlocked vast
potential for visual creativity. However, these models struggle with generation
of consistent characters, a crucial aspect for numerous real-world applications
such as story visualization, game development asset design, advertising, and
more. Current methods typically rely on multiple pre-existing images of the
target character or involve labor-intensive manual processes. In this work, we
propose a fully automated solution for consistent character generation, with
the sole input being a text prompt. We introduce an iterative procedure that,
at each stage, identifies a coherent set of images sharing a similar identity
and extracts a more consistent identity from this set. Our quantitative
analysis demonstrates that our method strikes a better balance between prompt
alignment and identity consistency compared to the baseline methods, and these
findings are reinforced by a user study. To conclude, we showcase several
practical applications of our approach. Project page is available at
https://omriavrahami.com/the-chosen-oneSummary
AI-Generated Summary