スタイルアラインド画像生成における共有アテンション

要旨

大規模なテキストから画像への変換（T2I）モデルは、創造的な分野で急速に注目を集め、テキストプロンプトから視覚的に魅力的な出力を生成しています。しかし、これらのモデルを制御して一貫したスタイルを保証することは依然として課題であり、既存の方法ではコンテンツとスタイルを分離するためにファインチューニングや手動介入が必要です。本論文では、生成された一連の画像間でスタイルの整合性を確立するための新しい技術であるStyleAlignedを紹介します。拡散プロセス中に最小限の「アテンション共有」を採用することで、本手法はT2Iモデル内の画像間でスタイルの一貫性を維持します。このアプローチにより、参照スタイルを使用してスタイルが一貫した画像を簡単な反転操作で作成することが可能です。多様なスタイルとテキストプロンプトに対する本手法の評価は、高品質な合成と忠実度を示し、様々な入力に対して一貫したスタイルを達成するその有効性を強調しています。

English

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.

スタイルアラインド画像生成における共有アテンション

Style Aligned Image Generation via Shared Attention

要旨

Support