Spatial Knitting Attentionsを統合して、拡散モデルに高レベルおよび高忠実度条件を埋め込むHelloMeme

要旨

我々は、テキストから画像への基盤モデルにアダプタを挿入する効果的な手法を提案します。これにより、基本モデルの汎化能力を維持しながら、複雑な下流タスクを実行することが可能となります。この手法の中心的なアイデアは、2次元特徴マップに関連する注意メカニズムを最適化することで、アダプタの性能を向上させることです。このアプローチは、ミーム動画生成のタスクで検証され、重要な結果を達成しました。この研究が大規模なテキストから画像へのモデルの事後トレーニングタスクに示唆を与えることを期待しています。さらに、この手法がSD1.5派生モデルとの互換性が高いことを示しているため、オープンソースコミュニティにとって一定の価値があります。したがって、関連するコードを公開する予定です（https://songkey.github.io/hellomeme）。

English

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model. The core idea of this method is to optimize the attention mechanism related to 2D feature maps, which enhances the performance of the adapter. This approach was validated on the task of meme video generation and achieved significant results. We hope this work can provide insights for post-training tasks of large text-to-image models. Additionally, as this method demonstrates good compatibility with SD1.5 derivative models, it holds certain value for the open-source community. Therefore, we will release the related code (https://songkey.github.io/hellomeme).

Spatial Knitting Attentionsを統合して、拡散モデルに高レベルおよび高忠実度条件を埋め込むHelloMeme

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

要旨

Support