CollectionLoRA: 다중 교사 온-정책 증류를 통해 50가지 효과를 하나의 LoRA에 통합

초록

맞춤형 이미지 편집은 제한된 쌍 데이터를 사용하여 사전 학습된 확산 모델에 특정 시각적 효과를 부여하는 것을 목표로 하며, 일반적으로 저랭크 적응(LoRA)을 통해 수행된다. 원하는 효과의 수가 증가함에 따라, 이러한 다수의 효과 LoRA를 저장하고 동적으로 로딩하는 과정은 배포 오버헤드를 크게 증가시킨다. 또한, 기존 파이프라인은 일반적으로 이러한 효과 LoRA를 고속 생성을 위한 가속 모듈과 캐스케이드 방식으로 결합하는데, 이는 심각한 파라미터 간섭을 유발하여 개념 혼합 및 스타일 저하를 초래한다. 본 논문에서는 최대 50개의 서로 다른 효과 LoRA의 개념과 소수 단계 생성 능력을 단일 LoRA로 증류할 수 있는 다중 교사 온폴리시 증류 프레임워크인 CollectionLoRA를 제안한다. 이는 특징 간섭 문제를 근본적으로 해결하고 배포 비용을 크게 절감한다. 구체적으로, 본 방법은 (i) 모델이 훈련 중 데이터 소스를 무작위로 전환할 수 있도록 하여 보이지 않는 시나리오에서의 일반화를 효과적으로 향상시키는 확률적 이중 스트림 라우팅 메커니즘, (ii) 프롬프트 공간 내 개념 분리를 달성하는 비대칭 직교 프롬프팅 전략, (iii) 교사 모델과 학생 모델 간의 분포 차이를 완화하는 조대세밀 증류 목표를 도입한다. 광범위한 평가 결과, CollectionLoRA는 모든 맞춤형 효과와 소수 단계 생성을 단일 LoRA로 증류하여 배포 오버헤드를 줄이는 동시에, 독립적으로 훈련된 교사 모델과 동등하거나 더 나은 개념 충실도를 달성함을 보여준다.

English

Customized image editing aims to equip pre-trained diffusion models with specific visual effects using limited paired data, typically via Low-Rank Adaptation (LoRA). As the number of desired effects grows, storing and dynamically loading numerous these effect LoRAs significantly increases deployment overhead. Furthermore, current pipelines typically cascade these effect LoRAs with acceleration modules for fast generation, which triggers severe parameter interference and results in concept bleeding and style degradation. We propose CollectionLoRA, a multi-teacher on-policy distillation framework capable of distilling the concepts of up to 50 different effect LoRAs along with few-step generation capabilities into a single LoRA. This fundamentally resolves the feature interference issue and significantly reduces deployment costs. Specifically, the method introduces (i) a Probabilistic Dual-Stream Routing mechanism that enables the model to randomly switch between data sources during training, effectively enhancing its generalization in unseen scenarios; (ii) an Asymmetric Orthogonal Prompting strategy to achieve concept isolation within the prompt space; (iii) a Coarse-to-Fine Distillation Objective to mitigate the distribution gap between the teacher and student models. Extensive evaluations show that CollectionLoRA distills all customized effects and few-step generation into a single LoRA, reducing deployment overhead while achieving concept fidelity comparable to or better than independently trained teacher models.