Mix-of-Show: Gedecentraliseerde Low-Rank Aanpassing voor Multi-Concept Aanpassing van Diffusiemodellen

Samenvatting

Openbare grootschalige tekst-naar-beeld diffusiemodellen, zoals Stable Diffusion, hebben aanzienlijke aandacht gekregen van de gemeenschap. Deze modellen kunnen eenvoudig worden aangepast voor nieuwe concepten met behulp van low-rank aanpassingen (LoRA's). Het gebruik van meerdere concept-LoRA's om gezamenlijk meerdere aangepaste concepten te ondersteunen, vormt echter een uitdaging. Wij verwijzen naar dit scenario als gedecentraliseerde multi-concept aanpassing, waarbij sprake is van single-client conceptafstemming en center-node conceptfusie. In dit artikel stellen we een nieuw framework voor, genaamd Mix-of-Show, dat de uitdagingen van gedecentraliseerde multi-concept aanpassing aanpakt, waaronder conceptconflicten als gevolg van bestaande single-client LoRA-afstemming en identiteitsverlies tijdens modelfusie. Mix-of-Show maakt gebruik van een embedding-gedecomposeerde LoRA (ED-LoRA) voor single-client afstemming en gradiëntfusie voor de center node om de in-domain essentie van individuele concepten te behouden en theoretisch onbeperkte conceptfusie te ondersteunen. Daarnaast introduceren we regionaal controleerbare sampling, dat ruimtelijk controleerbare sampling (bijvoorbeeld ControlNet en T2I-Adaptor) uitbreidt om attribuutbinding en ontbrekende objectproblemen in multi-concept sampling aan te pakken. Uitgebreide experimenten tonen aan dat Mix-of-Show in staat is om meerdere aangepaste concepten, waaronder personages, objecten en scènes, met hoge nauwkeurigheid samen te stellen.

English

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.

Mix-of-Show: Gedecentraliseerde Low-Rank Aanpassing voor Multi-Concept Aanpassing van Diffusiemodellen

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Samenvatting

Support