トレーニング不要の一貫性あるテキストから画像への生成

要旨

テキストから画像を生成するモデルは、自然言語を通じて画像生成プロセスを誘導できるため、新たなレベルの創造的柔軟性を提供します。しかし、多様なプロンプトにわたって同じ主題を一貫して描写することは依然として困難です。既存のアプローチでは、特定のユーザー提供の主題を記述する新しい単語を教えるためにモデルを微調整したり、モデルに画像条件付けを追加したりします。これらの方法では、主題ごとに長時間の最適化や大規模な事前学習が必要です。さらに、生成された画像とテキストプロンプトを整合させることに苦労し、複数の主題を描写する際にも困難が生じます。ここでは、事前学習済みモデルの内部活性化を共有することで、一貫した主題生成を可能にするトレーニング不要のアプローチであるConsiStoryを紹介します。主題駆動型の共有アテンションブロックと、対応関係に基づく特徴注入を導入し、画像間の主題一貫性を促進します。さらに、主題一貫性を維持しながらレイアウトの多様性を促す戦略を開発します。ConsiStoryを一連のベースラインと比較し、最適化ステップを一切必要とせずに、主題一貫性とテキスト整合性において最先端の性能を実証します。最後に、ConsiStoryは自然に複数主題のシナリオに拡張でき、一般的なオブジェクトに対するトレーニング不要のパーソナライゼーションも可能にします。

English

Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language. However, using these models to consistently portray the same subject across diverse prompts remains challenging. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects or add image conditioning to the model. These methods require lengthy per-subject optimization or large-scale pre-training. Moreover, they struggle to align generated images with text prompts and face difficulties in portraying multiple subjects. Here, we present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model. We introduce a subject-driven shared attention block and correspondence-based feature injection to promote subject consistency between images. Additionally, we develop strategies to encourage layout diversity while maintaining subject consistency. We compare ConsiStory to a range of baselines, and demonstrate state-of-the-art performance on subject consistency and text alignment, without requiring a single optimization step. Finally, ConsiStory can naturally extend to multi-subject scenarios, and even enable training-free personalization for common objects.

トレーニング不要の一貫性あるテキストから画像への生成

Training-Free Consistent Text-to-Image Generation

要旨

Support