被選者：在文本到圖像擴散模型中的一致性角色

摘要

最近在文本到圖像生成模型方面的進展為視覺創造力開拓了廣闊的潛力。然而，這些模型在生成一致的角色方面存在困難，這對於眾多現實應用（如故事視覺化、遊戲開發資產設計、廣告等）至關重要。目前的方法通常依賴於目標角色的多個現有圖像，或者涉及勞動密集型的手動過程。在這項工作中，我們提出了一種完全自動化的解決方案，用於一致性角色生成，其唯一輸入是文本提示。我們引入了一種迭代程序，每個階段都識別出一組一致的圖像，這些圖像共享相似的身份，並從這組圖像中提取出更一致的身份。我們的定量分析表明，相較於基準方法，我們的方法在提示對齊和身份一致性之間取得了更好的平衡，這些發現得到了用戶研究的支持。最後，我們展示了我們方法的幾個實際應用。項目頁面位於https://omriavrahami.com/the-chosen-one

English

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, these models struggle with generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach. Project page is available at https://omriavrahami.com/the-chosen-one

被選者：在文本到圖像擴散模型中的一致性角色

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

摘要

Support