使用分层级损失进行稳定Diffusion XL的渐进式知识蒸馏
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss
January 5, 2024
作者: Yatharth Gupta, Vishnu V. Jaddipal, Harish Prabhala, Sayak Paul, Patrick Von Platen
cs.AI
摘要
稳定扩散 XL(SDXL)已成为最优秀的开源文本到图像模型(T2I),因其多功能性和一流的图像质量而著称。有效地解决 SDXL 模型的计算需求对于扩大影响范围和适用性至关重要。在这项工作中,我们介绍了两个经过缩减的变体,Segmind 稳定扩散(SSD-1B)和 Segmind-Vega,分别具有 13 亿和 7.4 亿参数的 UNets,通过逐渐移除层级损失来专注于减小模型大小同时保留生成质量。我们在 https://hf.co/Segmind 上发布了这些模型的权重。我们的方法涉及从 SDXL 的 U-Net 结构中消除残余网络和变换器块,从而显著减少参数和延迟。我们的紧凑模型通过利用转移知识有效地模拟了原始的 SDXL,与更大的数十亿参数的 SDXL 取得了竞争性结果。我们的工作强调了知识蒸馏与层级损失相结合在减小模型大小的同时保留 SDXL 高质量生成能力的有效性,从而在资源受限环境中促进更易部署。
English
Stable Diffusion XL (SDXL) has become the best open source text-to-image
model (T2I) for its versatility and top-notch image quality. Efficiently
addressing the computational demands of SDXL models is crucial for wider reach
and applicability. In this work, we introduce two scaled-down variants, Segmind
Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter
UNets, respectively, achieved through progressive removal using layer-level
losses focusing on reducing the model size while preserving generative quality.
We release these models weights at https://hf.co/Segmind. Our methodology
involves the elimination of residual networks and transformer blocks from the
U-Net structure of SDXL, resulting in significant reductions in parameters, and
latency. Our compact models effectively emulate the original SDXL by
capitalizing on transferred knowledge, achieving competitive results against
larger multi-billion parameter SDXL. Our work underscores the efficacy of
knowledge distillation coupled with layer-level losses in reducing model size
while preserving the high-quality generative capabilities of SDXL, thus
facilitating more accessible deployment in resource-constrained environments.