使用分层级损失进行稳定Diffusion XL的渐进式知识蒸馏

摘要

稳定扩散 XL（SDXL）已成为最优秀的开源文本到图像模型（T2I），因其多功能性和一流的图像质量而著称。有效地解决 SDXL 模型的计算需求对于扩大影响范围和适用性至关重要。在这项工作中，我们介绍了两个经过缩减的变体，Segmind 稳定扩散（SSD-1B）和 Segmind-Vega，分别具有 13 亿和 7.4 亿参数的 UNets，通过逐渐移除层级损失来专注于减小模型大小同时保留生成质量。我们在 https://hf.co/Segmind 上发布了这些模型的权重。我们的方法涉及从 SDXL 的 U-Net 结构中消除残余网络和变换器块，从而显著减少参数和延迟。我们的紧凑模型通过利用转移知识有效地模拟了原始的 SDXL，与更大的数十亿参数的 SDXL 取得了竞争性结果。我们的工作强调了知识蒸馏与层级损失相结合在减小模型大小的同时保留 SDXL 高质量生成能力的有效性，从而在资源受限环境中促进更易部署。

English

Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality. Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability. In this work, we introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively, achieved through progressive removal using layer-level losses focusing on reducing the model size while preserving generative quality. We release these models weights at https://hf.co/Segmind. Our methodology involves the elimination of residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters, and latency. Our compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger multi-billion parameter SDXL. Our work underscores the efficacy of knowledge distillation coupled with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL, thus facilitating more accessible deployment in resource-constrained environments.

使用分层级损失进行稳定Diffusion XL的渐进式知识蒸馏

Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss

摘要

Support