单程票：时间无关的统一编码器用于文本到图像扩散模型的蒸馏

摘要

文本到图像（T2I）扩散模型在生成建模领域取得了显著进展；然而，它们在推理速度与图像质量之间面临权衡，这为高效部署带来了挑战。现有的蒸馏T2I模型能够在较少的采样步骤下生成高保真图像，但往往在多样性和质量上表现欠佳，尤其是一步式模型。通过我们的分析，我们观察到UNet编码器中存在冗余计算。我们的研究发现，对于T2I扩散模型，解码器更擅长捕捉更丰富且更明确的语义信息，而编码器则可以在不同时间步的解码器之间有效共享。基于这些观察，我们首次为学生模型UNet架构引入了时间无关的统一编码器TiUE，这是一种无循环的图像生成方法，用于蒸馏T2I扩散模型。采用一次性方案，TiUE在多个解码器时间步之间共享编码器特征，实现了并行采样并显著降低了推理时间复杂性。此外，我们引入了一个KL散度项来正则化噪声预测，从而增强了生成图像的感知真实性和多样性。实验结果表明，TiUE在包括LCM、SD-Turbo和SwiftBrushv2在内的最先进方法中表现优异，在保持计算效率的同时，生成了更多样且更真实的结果。

English

Text-to-Image (T2I) diffusion models have made remarkable advancements in generative modeling; however, they face a trade-off between inference speed and image quality, posing challenges for efficient deployment. Existing distilled T2I models can generate high-fidelity images with fewer sampling steps, but often struggle with diversity and quality, especially in one-step models. From our analysis, we observe redundant computations in the UNet encoders. Our findings suggest that, for T2I diffusion models, decoders are more adept at capturing richer and more explicit semantic information, while encoders can be effectively shared across decoders from diverse time steps. Based on these observations, we introduce the first Time-independent Unified Encoder TiUE for the student model UNet architecture, which is a loop-free image generation approach for distilling T2I diffusion models. Using a one-pass scheme, TiUE shares encoder features across multiple decoder time steps, enabling parallel sampling and significantly reducing inference time complexity. In addition, we incorporate a KL divergence term to regularize noise prediction, which enhances the perceptual realism and diversity of the generated images. Experimental results demonstrate that TiUE outperforms state-of-the-art methods, including LCM, SD-Turbo, and SwiftBrushv2, producing more diverse and realistic results while maintaining the computational efficiency.

单程票：时间无关的统一编码器用于文本到图像扩散模型的蒸馏

One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

摘要

Support