StyleAdapter：単一パスでLoRA不要なスタイル化画像生成モデル

要旨

本論文では、テキストプロンプトとスタイル参照画像を入力として、単一パスで出力画像を生成するLoRA不要のスタイル化画像生成手法を提案する。既存手法が各スタイルごとに個別のLoRAを訓練する必要があるのに対し、本手法は統一モデルで様々なスタイルに適応可能である。しかし、これには2つの課題がある：1) プロンプトが生成内容に対する制御性を失うこと、2) 出力画像がスタイル参照画像の意味的・スタイル的特徴を継承し、内容の忠実性が損なわれることである。これらの課題を解決するため、我々はStyleAdapterを導入する。このモデルは、2パス交差注意モジュール（TPCA）と3つの分離戦略で構成されており、プロンプトとスタイル参照特徴を別々に処理し、スタイル参照における意味情報とスタイル情報の強い結合を低減する。StyleAdapterは、プロンプトの内容に一致し、参照画像のスタイルを採用した高品質な画像を（未見のスタイルに対しても）単一パスで生成可能であり、従来手法よりも柔軟かつ効率的である。実験を通じて、本手法が従来手法を上回る優位性を実証した。

English

This paper presents a LoRA-free method for stylized image generation that takes a text prompt and style reference images as inputs and produces an output image in a single pass. Unlike existing methods that rely on training a separate LoRA for each style, our method can adapt to various styles with a unified model. However, this poses two challenges: 1) the prompt loses controllability over the generated content, and 2) the output image inherits both the semantic and style features of the style reference image, compromising its content fidelity. To address these challenges, we introduce StyleAdapter, a model that comprises two components: a two-path cross-attention module (TPCA) and three decoupling strategies. These components enable our model to process the prompt and style reference features separately and reduce the strong coupling between the semantic and style information in the style references. StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods. Experiments have been conducted to demonstrate the superiority of our method over previous works.

StyleAdapter：単一パスでLoRA不要なスタイル化画像生成モデル

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

要旨

Support