提示感知加权的无训练多概念LoRA组合

摘要

低秩适应（LoRA）通过将预训练扩散模型适配到特定视觉概念和风格，成功实现了文本到图像生成中的个性化定制。然而，将此类模型扩展到多概念定制仍然具有挑战性。简单组合多个LoRA权重或它们的输出通常会导致概念间的干扰，从而降低视觉质量并削弱对单个概念参考图像的保真度。本文提出一种简单而有效的多概念定制方法，通过最优组合多个LoRA模块的输出实现。我们利用生成过程中从相应提示词标记推断出的每个概念的相对重要性，并引入两种方法——W-Switch和W-Composite，采用提示词感知的重要性加权策略，其中每个LoRA根据其触发词在目标提示中的语义影响程度进行加权。此外，我们扩展了现有的定量评估指标，提出一种新的基于图像的相似性评估框架，通过比较真实世界参考图像与生成图像中自动分割的概念区域，来评估图像保真度和身份保持能力。我们在ComposLoRA测试平台上评估了所提方法，并展示了在视觉质量、身份保持和组合性方面相较于现有最先进方法的持续改进。定性评估（包括基于大语言模型的评估和用户研究）进一步验证了所提方法的有效性，并与新引入的基于图像的定量指标保持一致。我们的代码见https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition。

English

Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or their outputs often leads to interference among concepts, resulting in degraded visual quality and reduced fidelity to the reference images of individual concepts. This paper proposes a simple yet effective approach for multi-concept customization by optimally combining the outputs of multiple LoRA modules. We leverage the relative importance of each concept during generation, as inferred from its corresponding prompt tokens and introduce two methods, W-Switch and W-Composite, that employ a prompt-aware importance weighting strategy in which each LoRA is weighted according to the semantic influence of its trigger words in the target prompt. In addition, we extend existing quantitative evaluation metrics by proposing a new image-based similarity evaluation framework that assesses image fidelity and identity preservation through comparisons between real-world reference images and automatically segmented concept regions from generated images. We evaluate our approach on the ComposLoRA testbed and demonstrate consistent improvements over existing state-of-the-art methods in terms of visual quality, identity preservation and compositionality. Qualitative evaluations, including a Large Language Model (LLM) based assessment and a user study, further validate the effectiveness of the proposed methods and align with the newly introduced quantitative image-based metrics. Our code is available at https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition.