スタイルコード：画像生成のためのスタイル情報のエンコーディング

要旨

拡散モデルは画像生成に優れていますが、それらを制御することは依然として課題です。私たちはスタイル条件付き画像生成の問題に焦点を当てています。例えば画像を使用する方法は機能しますが、手間がかかります。MidJourneyのsrefs（スタイル参照コード）は、特定の画像スタイルを短い数値コードで表現することでこの問題を解決します。これらは、ソーシャルメディア全体で広く採用されており、共有しやすいことと、元の画像を投稿せずに画像をスタイル制御に使用できるという点が理由です。しかし、ユーザーは自分自身の画像からsrefsを生成することができず、また基礎となるトレーニング手順も公開されていません。私たちはStyleCodesを提案します。これは、画像スタイルを20文字のbase64コードとして表現するためのオープンソースおよびオープンリサーチのスタイルエンコーダーアーキテクチャとトレーニング手順です。私たちの実験は、従来の画像からスタイルへの手法と比較して、エンコーディングが品質の最小損失をもたらすことを示しています。

English

Diffusion models excel in image generation, but controlling them remains a challenge. We focus on the problem of style-conditioned image generation. Although example images work, they are cumbersome: srefs (style-reference codes) from MidJourney solve this issue by expressing a specific image style in a short numeric code. These have seen widespread adoption throughout social media due to both their ease of sharing and the fact they allow using an image for style control, without having to post the source images themselves. However, users are not able to generate srefs from their own images, nor is the underlying training procedure public. We propose StyleCodes: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code. Our experiments show that our encoding results in minimal loss in quality compared to traditional image-to-style techniques.

スタイルコード：画像生成のためのスタイル情報のエンコーディング

Stylecodes: Encoding Stylistic Information For Image Generation

要旨

Support