單程票：時間無關的統一編碼器，用於蒸餾文本到圖像擴散模型

摘要

文本到圖像（T2I）擴散模型在生成建模方面取得了顯著進展；然而，它們在推理速度與圖像質量之間面臨著取捨，這為高效部署帶來了挑戰。現有的蒸餾T2I模型能夠以較少的採樣步驟生成高保真圖像，但通常在多樣性和質量上存在困難，特別是在一步模型中。通過我們的分析，我們觀察到UNet編碼器中存在冗餘計算。我們的研究發現表明，對於T2I擴散模型，解碼器更擅長捕捉更豐富且更明確的語義信息，而編碼器則可以有效地在不同時間步的解碼器之間共享。基於這些觀察，我們首次為學生模型UNet架構引入了時間獨立統一編碼器TiUE，這是一種無環路的圖像生成方法，用於蒸餾T2I擴散模型。採用一次性方案，TiUE在多個解碼器時間步之間共享編碼器特徵，實現了並行採樣並顯著降低了推理時間複雜度。此外，我們引入了KL散度項來正則化噪聲預測，從而增強了生成圖像的感知真實性和多樣性。實驗結果表明，TiUE在包括LCM、SD-Turbo和SwiftBrushv2在內的先進方法中表現優異，在保持計算效率的同時，生成了更多樣且更真實的結果。

English

Text-to-Image (T2I) diffusion models have made remarkable advancements in generative modeling; however, they face a trade-off between inference speed and image quality, posing challenges for efficient deployment. Existing distilled T2I models can generate high-fidelity images with fewer sampling steps, but often struggle with diversity and quality, especially in one-step models. From our analysis, we observe redundant computations in the UNet encoders. Our findings suggest that, for T2I diffusion models, decoders are more adept at capturing richer and more explicit semantic information, while encoders can be effectively shared across decoders from diverse time steps. Based on these observations, we introduce the first Time-independent Unified Encoder TiUE for the student model UNet architecture, which is a loop-free image generation approach for distilling T2I diffusion models. Using a one-pass scheme, TiUE shares encoder features across multiple decoder time steps, enabling parallel sampling and significantly reducing inference time complexity. In addition, we incorporate a KL divergence term to regularize noise prediction, which enhances the perceptual realism and diversity of the generated images. Experimental results demonstrate that TiUE outperforms state-of-the-art methods, including LCM, SD-Turbo, and SwiftBrushv2, producing more diverse and realistic results while maintaining the computational efficiency.

單程票：時間無關的統一編碼器，用於蒸餾文本到圖像擴散模型

One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

摘要

Support