文本到图像生成的少步数蒸馏:实用指南
Few-Step Distillation for Text-to-Image Generation: A Practical Guide
December 15, 2025
作者: Yifan Pu, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Fan Wang, Bohan Zhuang, Gao Huang
cs.AI
摘要
扩散蒸馏技术已显著加速了类别条件图像生成,但其在开放式文本到图像生成领域的适用性仍不明确。本研究首次系统性地将前沿蒸馏技术适配并应用于强大的T2I教师模型FLUX.1-lite。通过将现有方法纳入统一框架,我们揭示了从离散类别标签转向自由语言提示时出现的关键障碍。除深入的方法论分析外,我们还提供了关于输入缩放、网络架构和超参数的实用指南,并同步开源实现代码与预训练学生模型。本研究为在实际T2I应用中部署高速、高保真且资源高效的扩散生成器奠定了坚实基础。代码已发布于github.com/alibaba-damo-academy/T2I-Distill。
English
Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear. We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite. By casting existing methods into a unified framework, we identify the key obstacles that arise when moving from discrete class labels to free-form language prompts. Beyond a thorough methodological analysis, we offer practical guidelines on input scaling, network architecture, and hyperparameters, accompanied by an open-source implementation and pretrained student models. Our findings establish a solid foundation for deploying fast, high-fidelity, and resource-efficient diffusion generators in real-world T2I applications. Code is available on github.com/alibaba-damo-academy/T2I-Distill.