スタイルスタジオ：テキスト駆動型スタイル転送とスタイル要素の選択的制御

要旨

テキスト駆動スタイル変換は、参照画像のスタイルをテキストプロンプトで記述されたコンテンツと統合することを目指しています。テキストから画像へのモデルの最近の進歩により、スタイル変換の微妙さが向上しましたが、依然として重要な課題が残っています。特に、参照スタイルへの過学習、スタイルの制御の制限、およびテキストコンテンツとの不一致が挙げられます。本論文では、これらの問題に対処するための3つの補完的戦略を提案します。まず、スタイルとテキストの特徴をより良く統合し、整合性を高めるために、クロスモーダル適応インスタンス正規化（AdaIN）メカニズムを導入します。次に、スタイルベースの分類器フリーガイダンス（SCFG）アプローチを開発し、スタイル要素に対する選択的制御を可能にすることで、関連のない影響を減らします。最後に、初期生成段階で教師モデルを組み込むことで、空間レイアウトを安定させ、アーティファクトを軽減します。私たちの包括的な評価は、スタイル変換の品質とテキストプロンプトとの整合性が著しく向上していることを示しています。さらに、私たちのアプローチは、既存のスタイル変換フレームワークに微調整なしで統合することができます。

English

Text-driven style transfer aims to merge the style of a reference image with content described by a text prompt. Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content. In this paper, we propose three complementary strategies to address these issues. First, we introduce a cross-modal Adaptive Instance Normalization (AdaIN) mechanism for better integration of style and text features, enhancing alignment. Second, we develop a Style-based Classifier-Free Guidance (SCFG) approach that enables selective control over stylistic elements, reducing irrelevant influences. Finally, we incorporate a teacher model during early generation stages to stabilize spatial layouts and mitigate artifacts. Our extensive evaluations demonstrate significant improvements in style transfer quality and alignment with textual prompts. Furthermore, our approach can be integrated into existing style transfer frameworks without fine-tuning.

スタイルスタジオ：テキスト駆動型スタイル転送とスタイル要素の選択的制御

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

要旨

Support