K-LoRA: 任意の主題とスタイルのLoRAをトレーニング不要で融合する手法

要旨

近年の研究では、異なるLoRAを組み合わせて学習されたスタイルとコンテンツを共同生成する手法が探求されています。しかし、既存の手法では、元の被写体とスタイルを同時に効果的に保存することができないか、追加の訓練が必要となるという課題があります。本論文では、LoRAの本質的な特性が拡散モデルにおいて学習された被写体とスタイルの融合を効果的に導くことができると主張します。この洞察に基づき、我々はK-LoRAを提案します。これは、訓練不要のシンプルかつ効果的なLoRA融合アプローチです。各アテンションレイヤーにおいて、K-LoRAは融合される各LoRAのTop-K要素を比較し、最適な融合のためにどのLoRAを選択するかを決定します。この選択メカニズムにより、融合プロセス中に被写体とスタイルの最も代表的な特徴が保持され、それらの貢献が効果的にバランスされます。実験結果は、提案手法が元のLoRAによって学習された被写体とスタイル情報を効果的に統合し、定性的および定量的な結果において最先端の訓練ベースのアプローチを上回ることを示しています。

English

Recent studies have explored combining different LoRAs to jointly generate learned style and content. However, existing methods either fail to effectively preserve both the original subject and style simultaneously or require additional training. In this paper, we argue that the intrinsic properties of LoRA can effectively guide diffusion models in merging learned subject and style. Building on this insight, we propose K-LoRA, a simple yet effective training-free LoRA fusion approach. In each attention layer, K-LoRA compares the Top-K elements in each LoRA to be fused, determining which LoRA to select for optimal fusion. This selection mechanism ensures that the most representative features of both subject and style are retained during the fusion process, effectively balancing their contributions. Experimental results demonstrate that the proposed method effectively integrates the subject and style information learned by the original LoRAs, outperforming state-of-the-art training-based approaches in both qualitative and quantitative results.

K-LoRA: 任意の主題とスタイルのLoRAをトレーニング不要で融合する手法

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

要旨

Support