ChatPaper.aiChatPaper

邊緣融合:裝置端文字轉圖像生成

EdgeFusion: On-Device Text-to-Image Generation

April 18, 2024
作者: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim
cs.AI

摘要

對於文本到圖像生成的穩定擴散(SD)而言,其密集的計算負擔對於實際應用構成了重大障礙。為了應對這一挑戰,最近的研究聚焦於減少抽樣步驟的方法,如潛在一致性模型(LCM),以及採用架構優化,包括剪枝和知識蒸餾。與現有方法不同,我們獨特地從一個緊湊的SD變體BK-SDM開始。我們觀察到,將LCM直接應用於使用常見爬取數據集的BK-SDM會產生不理想的結果。這促使我們制定了兩種策略:(1)利用來自領先生成模型的高質量圖像-文本配對,以及(2)設計一個針對LCM量身定制的高級蒸餾過程。通過對量化、分析和在設備上部署的深入探索,我們實現了在僅兩個步驟中快速生成照片逼真、文本對齊的圖像,在資源有限的邊緣設備上的延遲時間不到一秒。
English
The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.

Summary

AI-Generated Summary

PDF231December 15, 2024