이미지 생성을 위한 다중 LoRA 구성

초록

Low-Rank Adaptation (LoRA)는 텍스트-이미지 모델에서 특정 요소(예: 독특한 캐릭터나 스타일)를 정확하게 표현하기 위해 널리 활용된다. 그러나 기존 방법들은 여러 LoRA를 효과적으로 조합하는 데 어려움을 겪으며, 특히 통합해야 할 LoRA의 수가 증가함에 따라 복잡한 이미지 생성이 제한된다. 본 논문에서는 디코딩 중심의 관점을 통해 다중 LoRA 조합을 연구한다. 우리는 두 가지 학습이 필요 없는 방법을 제안한다: 각 디노이징 단계에서 서로 다른 LoRA를 전환하는 LoRA Switch와 모든 LoRA를 동시에 통합하여 더 일관된 이미지 합성을 유도하는 LoRA Composite이다. 제안된 방법을 평가하기 위해 본 연구의 일환으로 새로운 종합 테스트베드인 ComposLoRA를 구축하였다. 이 테스트베드는 480개의 조합 세트를 포함한 다양한 LoRA 카테고리를 특징으로 한다. GPT-4V 기반의 평가 프레임워크를 활용한 결과, 특히 조합 내 LoRA 수가 증가할 때 우리의 방법이 기존 기준선보다 성능이 뚜렷하게 향상됨을 확인하였다.

English

Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition.

이미지 생성을 위한 다중 LoRA 구성

Multi-LoRA Composition for Image Generation

초록

Support