基於提示感知加權的無需訓練多概念LoRA組合

摘要

低秩適應（LoRA）成功實現了文字生成圖像中的個性化，透過將預訓練擴散模型適應至特定的視覺概念與風格。然而，將此類模型延伸至多概念定制仍具挑戰性。直接組合多個LoRA權重或其輸出常導致概念間的干擾，造成視覺品質下降，並降低對個別概念參考圖像的保真度。本文提出一種簡單而有效的方法，透過最佳化組合多個LoRA模組的輸出來實現多概念定制。我們利用生成過程中從對應提示詞標記推斷出的各概念相對重要性，並引入兩種方法——W-Switch與W-Composite——採用提示詞感知的重要性加權策略，根據觸發詞在目標提示中的語義影響來加權每個LoRA。此外，我們透過提出一個新的基於影像的相似性評估框架來擴展現有定量評估指標，該框架透過比較現實世界參考圖像與生成影像中自動分割的概念區域，評估影像保真度與身份保留。我們在ComposLoRA測試平台上評估所提出方法，並在視覺品質、身份保留與合成性方面，相較現有最先進方法展現出持續改進。定性評估（包括基於大型語言模型的評估與使用者研究）進一步驗證了所提出方法的有效性，並與新引入的基於影像的定量指標一致。我們的程式碼已公開於https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition。

English

Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or their outputs often leads to interference among concepts, resulting in degraded visual quality and reduced fidelity to the reference images of individual concepts. This paper proposes a simple yet effective approach for multi-concept customization by optimally combining the outputs of multiple LoRA modules. We leverage the relative importance of each concept during generation, as inferred from its corresponding prompt tokens and introduce two methods, W-Switch and W-Composite, that employ a prompt-aware importance weighting strategy in which each LoRA is weighted according to the semantic influence of its trigger words in the target prompt. In addition, we extend existing quantitative evaluation metrics by proposing a new image-based similarity evaluation framework that assesses image fidelity and identity preservation through comparisons between real-world reference images and automatically segmented concept regions from generated images. We evaluate our approach on the ComposLoRA testbed and demonstrate consistent improvements over existing state-of-the-art methods in terms of visual quality, identity preservation and compositionality. Qualitative evaluations, including a Large Language Model (LLM) based assessment and a user study, further validate the effectiveness of the proposed methods and align with the newly introduced quantitative image-based metrics. Our code is available at https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition.