被选者：文本到图像扩散模型中的一致性角色

摘要

最近在文本到图像生成模型方面取得的进展为视觉创造力开辟了巨大潜力。然而，这些模型在生成一致的角色方面存在困难，这是许多现实世界应用的关键方面，如故事可视化、游戏开发资产设计、广告等。当前方法通常依赖于目标角色的多个现有图像或涉及劳动密集型的手动过程。在这项工作中，我们提出了一个完全自动化的解决方案，用于一致角色生成，其唯一输入是文本提示。我们引入了一个迭代过程，在每个阶段，识别出一组连贯的图像，共享相似的身份，并从这组图像中提取出更一致的身份。我们的定量分析表明，与基准方法相比，我们的方法在提示对齐和身份一致性之间取得了更好的平衡，这些发现得到了用户研究的支持。最后，我们展示了我们方法的几个实际应用。项目页面位于https://omriavrahami.com/the-chosen-one

English

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, these models struggle with generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach. Project page is available at https://omriavrahami.com/the-chosen-one

被选者：文本到图像扩散模型中的一致性角色

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

摘要

Support