選ばれし者：テキストから画像への拡散モデルにおける一貫性のあるキャラクター

要旨

テキストから画像を生成するモデルの最近の進展により、視覚的な創造性の広大な可能性が開かれました。しかし、これらのモデルは一貫性のあるキャラクターの生成に苦戦しており、これはストーリーの視覚化、ゲーム開発のアセットデザイン、広告など、数多くの実世界のアプリケーションにおいて重要な側面です。現在の手法では、通常、対象キャラクターの複数の既存画像に依存するか、手間のかかる手動プロセスを必要とします。本研究では、テキストプロンプトのみを入力とする完全自動化された一貫性のあるキャラクター生成の解決策を提案します。各段階で、類似したアイデンティティを共有する一貫性のある画像セットを特定し、このセットからより一貫性のあるアイデンティティを抽出する反復的な手順を導入します。定量分析により、本手法がベースライン手法と比較してプロンプトの整合性とアイデンティティの一貫性のバランスをより良く取っていることが示され、これらの結果はユーザー調査によっても裏付けられています。最後に、本アプローチのいくつかの実用的な応用例を紹介します。プロジェクトページはhttps://omriavrahami.com/the-chosen-oneでご覧いただけます。

English

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, these models struggle with generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach. Project page is available at https://omriavrahami.com/the-chosen-one

選ばれし者：テキストから画像への拡散モデルにおける一貫性のあるキャラクター

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

要旨

Support