CollectionLoRA: 通过多教师同策略蒸馏在一个LoRA中收集50种效果
CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation
May 25, 2026
作者: Fangtai Wu, Hailong Guo, Shijie Huang, Jiayi Song, Yubo Huang, Mushui Liu, Zhao Wang, Yunlong Yu, Jiaming Liu, Ruihua Huang
cs.AI
摘要
定制图像编辑旨在利用有限配对数据,通过低秩适配(LoRA)为预训练扩散模型配备特定视觉效果。随着所需效果的增多,存储和动态加载大量效果LoRA会显著增加部署开销。此外,现有流程通常将这些效果LoRA与加速模块级联以实现快速生成,这会导致严重的参数干扰,引发概念渗漏和风格退化。我们提出CollectionLoRA,一种多教师同策略蒸馏框架,能够将多达50种不同效果LoRA的概念以及少步生成能力蒸馏到单个LoRA中,从根本上解决特征干扰问题并显著降低部署成本。具体而言,该方法引入了:(i)概率双流路由机制,使模型在训练过程中随机切换数据源,有效增强其在未见场景中的泛化能力;(ii)非对称正交提示策略,在提示空间内实现概念隔离;(iii)由粗到细蒸馏目标,以缓解教师模型与学生模型之间的分布差异。大量评估表明,CollectionLoRA将所有定制效果和少步生成蒸馏到单个LoRA中,在降低部署开销的同时,实现了与独立训练教师模型相当或更优的概念保真度。
English
Customized image editing aims to equip pre-trained diffusion models with specific visual effects using limited paired data, typically via Low-Rank Adaptation (LoRA). As the number of desired effects grows, storing and dynamically loading numerous these effect LoRAs significantly increases deployment overhead. Furthermore, current pipelines typically cascade these effect LoRAs with acceleration modules for fast generation, which triggers severe parameter interference and results in concept bleeding and style degradation. We propose CollectionLoRA, a multi-teacher on-policy distillation framework capable of distilling the concepts of up to 50 different effect LoRAs along with few-step generation capabilities into a single LoRA. This fundamentally resolves the feature interference issue and significantly reduces deployment costs. Specifically, the method introduces (i) a Probabilistic Dual-Stream Routing mechanism that enables the model to randomly switch between data sources during training, effectively enhancing its generalization in unseen scenarios; (ii) an Asymmetric Orthogonal Prompting strategy to achieve concept isolation within the prompt space; (iii) a Coarse-to-Fine Distillation Objective to mitigate the distribution gap between the teacher and student models. Extensive evaluations show that CollectionLoRA distills all customized effects and few-step generation into a single LoRA, reducing deployment overhead while achieving concept fidelity comparable to or better than independently trained teacher models.