Mix-of-Show: マルチコンセプトカスタマイズのための分散型低ランク適応を備えた拡散モデル

要旨

Stable Diffusionのような大規模なテキストから画像への拡散モデルは、コミュニティから大きな注目を集めています。これらのモデルは、低ランク適応（LoRA）を用いて新しい概念に容易にカスタマイズすることができます。しかし、複数の概念LoRAを活用して複数のカスタマイズされた概念を共同でサポートすることは、課題となっています。我々はこのシナリオを分散型マルチコンセプトカスタマイズと呼び、シングルクライアントの概念チューニングとセンターノードの概念融合を含みます。本論文では、Mix-of-Showという新しいフレームワークを提案し、既存のシングルクライアントLoRAチューニングに起因する概念の衝突やモデル融合中のアイデンティティの喪失といった、分散型マルチコンセプトカスタマイズの課題に取り組みます。Mix-of-Showは、シングルクライアントチューニングのために埋め込み分解型LoRA（ED-LoRA）を採用し、センターノードでは勾配融合を行うことで、単一概念のドメイン内の本質を保持し、理論上無制限の概念融合をサポートします。さらに、マルチコンセプトサンプリングにおける属性のバインドや欠落したオブジェクトの問題に対処するために、空間的に制御可能なサンプリング（例：ControlNetやT2I-Adaptor）を拡張した地域的に制御可能なサンプリングを導入します。広範な実験により、Mix-of-Showがキャラクター、オブジェクト、シーンを含む複数のカスタマイズされた概念を高忠実度で構成できることが実証されています。

English

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.

Mix-of-Show: マルチコンセプトカスタマイズのための分散型低ランク適応を備えた拡散モデル

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

要旨

Support