ChatPaper.aiChatPaper

透過層級損失進行穩定Diffusion XL的漸進式知識蒸餾

Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss

January 5, 2024
作者: Yatharth Gupta, Vishnu V. Jaddipal, Harish Prabhala, Sayak Paul, Patrick Von Platen
cs.AI

摘要

穩定擴散 XL(SDXL)已成為最佳的開源文本到圖像模型(T2I),因其多功能性和一流的圖像質量而著稱。有效應對 SDXL 模型的計算需求對於擴大覆蓋範圍和應用範圍至關重要。在這項工作中,我們介紹了兩個經過縮減的變體,Segmind 穩定擴散(SSD-1B)和 Segmind-Vega,分別具有 13 億和 7.4 億參數的 UNets,通過逐步刪除使用以減少模型大小為重點的層級損失,實現了這些模型。我們在 https://hf.co/Segmind 上釋出這些模型權重。我們的方法涉及從 SDXL 的 U-Net 結構中消除殘差網絡和變壓器塊,從而顯著減少參數和延遲。我們的緊湊模型通過利用轉移知識有效地模擬原始的 SDXL,並在對抗更大的數十億參數 SDXL 的競爭結果中取得了競爭性成果。我們的工作強調了知識蒸餾與層級損失相結合,在減少模型大小的同時保留了 SDXL 的高質量生成能力,從而促進了在資源受限環境中更易部署的可能性。
English
Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality. Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability. In this work, we introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively, achieved through progressive removal using layer-level losses focusing on reducing the model size while preserving generative quality. We release these models weights at https://hf.co/Segmind. Our methodology involves the elimination of residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters, and latency. Our compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger multi-billion parameter SDXL. Our work underscores the efficacy of knowledge distillation coupled with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL, thus facilitating more accessible deployment in resource-constrained environments.
PDF242December 15, 2024