TeEFusion:融合文本嵌入以提炼无分类器引导
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
July 24, 2025
作者: Minghao Fu, Guo-Hua Wang, Xiaohao Chen, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang
cs.AI
摘要
文本到图像合成领域的最新进展在很大程度上得益于复杂的采样策略和无分类器引导(CFG),以确保生成高质量图像。然而,CFG依赖于两次前向传播,尤其是在结合复杂的采样算法时,导致了极高的推理成本。为解决这一问题,我们提出了TeEFusion(文本嵌入融合),这是一种新颖且高效的蒸馏方法,它直接将引导强度融入文本嵌入中,并蒸馏教师模型的复杂采样策略。通过简单的线性操作融合条件与无条件文本嵌入,TeEFusion无需额外参数即可重建所需的引导效果,同时使学生模型能够学习教师模型通过其复杂采样方法生成的输出。在诸如SD3等最先进模型上的大量实验表明,我们的方法使学生模型能够以更为简洁高效的采样策略紧密模仿教师模型的性能。因此,学生模型的推理速度比教师模型快至6倍,同时保持的图像质量与教师模型复杂采样方法所得相当。代码已公开于https://github.com/AIDC-AI/TeEFusion{github.com/AIDC-AI/TeEFusion}。
English
Recent advances in text-to-image synthesis largely benefit from sophisticated
sampling strategies and classifier-free guidance (CFG) to ensure high-quality
generation. However, CFG's reliance on two forward passes, especially when
combined with intricate sampling algorithms, results in prohibitively high
inference costs. To address this, we introduce TeEFusion (Text
Embeddings Fusion), a novel and efficient distillation method
that directly incorporates the guidance magnitude into the text embeddings and
distills the teacher model's complex sampling strategy. By simply fusing
conditional and unconditional text embeddings using linear operations,
TeEFusion reconstructs the desired guidance without adding extra parameters,
simultaneously enabling the student model to learn from the teacher's output
produced via its sophisticated sampling approach. Extensive experiments on
state-of-the-art models such as SD3 demonstrate that our method allows the
student to closely mimic the teacher's performance with a far simpler and more
efficient sampling strategy. Consequently, the student model achieves inference
speeds up to 6times faster than the teacher model, while maintaining image
quality at levels comparable to those obtained through the teacher's complex
sampling approach. The code is publicly available at
https://github.com/AIDC-AI/TeEFusion{github.com/AIDC-AI/TeEFusion}.