多層LoRA組合用於圖像生成

Multi-LoRA Composition for Image Generation

February 26, 2024

作者: Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

cs.AI

摘要

低秩適應（LoRA）被廣泛應用於文本到圖像模型中，以準確呈現生成圖像中的特定元素，如獨特角色或風格。然而，現有方法在有效組合多個LoRA時面臨挑戰，特別是當需要整合的LoRA數量增加時，這阻礙了複雜圖像的創作。本文通過解碼為中心的角度研究多LoRA組合。我們提出了兩種無需訓練的方法：LoRA切換，它在每個去噪步驟中在不同的LoRA之間交替，以及LoRA合成，它同時整合所有LoRA以引導更具連貫性的圖像合成。為了評估所提出的方法，我們建立了ComposLoRA，作為本研究的一部分的新綜合測試平臺。它包含480個組合集的各種LoRA類別。利用基於GPT-4V的評估框架，我們的研究結果表明，我們的方法在性能上明顯優於主流基準，特別是在增加組合中LoRA數量時更為明顯。

English

Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition.