图像生成的多LoRA组合

Multi-LoRA Composition for Image Generation

February 26, 2024

作者: Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

cs.AI

摘要

低秩适应（LoRA）在文本到图像模型中被广泛应用，以准确呈现生成图像中的特定元素，如独特字符或风格。然而，现有方法在有效组合多个LoRA方面面临挑战，特别是随着需要集成的LoRA数量增加，从而阻碍了复杂图像的创建。本文通过解码为中心的视角研究了多LoRA组合。我们提出了两种无需训练的方法：LoRA切换，它在每个去噪步骤中在不同LoRA之间交替，并LoRA复合，它同时整合所有LoRA以指导更具连贯性的图像合成。为评估所提出的方法，我们建立了ComposLoRA，作为本研究的一部分的新综合测试平台。它包含了480个组合设置的各种LoRA类别。利用基于GPT-4V的评估框架，我们的研究结果表明，相较于流行基准线，在组合中增加LoRA数量时，我们的方法在性能上有明显改善。

English

Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition.