ChatPaper.aiChatPaper

边缘融合:设备端文本到图像生成

EdgeFusion: On-Device Text-to-Image Generation

April 18, 2024
作者: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim
cs.AI

摘要

稳定扩散(SD)在文本到图像生成中的高强度计算负担对其实际应用构成了重大障碍。为了应对这一挑战,最近的研究集中在减少采样步骤的方法,如潜在一致性模型(LCM),以及采用架构优化,包括修剪和知识蒸馏。与现有方法不同,我们独特地从一个紧凑的SD变体BK-SDM入手。我们观察到,将LCM直接应用于使用常见爬取数据集的BK-SDM会产生不理想的结果。这促使我们制定了两种策略:(1)利用领先的生成模型中的高质量图像-文本对,以及(2)为LCM量身定制的先进蒸馏过程。通过我们对量化、剖析和在资源有限的边缘设备上部署的彻底探索,我们仅需两个步骤即可实现在不到一秒的延迟下快速生成逼真的、与文本对齐的图像。
English
The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.

Summary

AI-Generated Summary

PDF231December 15, 2024