Skrr:跳過並重複使用文本編碼層以實現記憶效率的文本到圖像生成
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
February 12, 2025
作者: Hoigi Seo, Wongi Jeong, Jae-sun Seo, Se Young Chun
cs.AI
摘要
在文字到圖像(T2I)擴散模型中,大規模文本編碼器展現出卓越的性能,能夠從文字提示中生成高質量的圖像。與依賴多次迭代步驟的去噪模塊不同,文本編碼器僅需進行單次前向傳遞即可生成文本嵌入。然而,儘管對總推理時間和浮點運算(FLOPs)的貢獻很小,文本編碼器卻需要顯著更高的記憶體使用量,高達去噪模塊的八倍。為解決這種效率問題,我們提出了Skip and Re-use layers(Skrr),這是一種針對T2I擴散模型中文本編碼器的簡單而有效的修剪策略。Skrr通過有針對性地跳過或重複使用轉換器塊中的某些層來利用轉換器塊中的固有冗餘,從而降低記憶體消耗而不影響性能。大量實驗表明,即使在高稀疏水平下,Skrr仍能保持與原始模型相當的圖像質量,勝過現有的塊狀修剪方法。此外,Skrr實現了最先進的記憶體效率,同時在多個評估指標(包括FID、CLIP、DreamSim和GenEval分數)上保持性能。
English
Large-scale text encoders in text-to-image (T2I) diffusion models have
demonstrated exceptional performance in generating high-quality images from
textual prompts. Unlike denoising modules that rely on multiple iterative
steps, text encoders require only a single forward pass to produce text
embeddings. However, despite their minimal contribution to total inference time
and floating-point operations (FLOPs), text encoders demand significantly
higher memory usage, up to eight times more than denoising modules. To address
this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet
effective pruning strategy specifically designed for text encoders in T2I
diffusion models. Skrr exploits the inherent redundancy in transformer blocks
by selectively skipping or reusing certain layers in a manner tailored for T2I
tasks, thereby reducing memory consumption without compromising performance.
Extensive experiments demonstrate that Skrr maintains image quality comparable
to the original model even under high sparsity levels, outperforming existing
blockwise pruning methods. Furthermore, Skrr achieves state-of-the-art memory
efficiency while preserving performance across multiple evaluation metrics,
including the FID, CLIP, DreamSim, and GenEval scores.