HybridStitch：像素与时间步层级模型缝合技术实现扩散模型加速

摘要

扩散模型在文本到图像生成应用中展现出卓越能力。尽管生成效果先进，这些模型却承受着巨大的计算开销，尤其对于包含数百亿参数的大型模型而言。已有研究表明，在部分去噪步骤中使用较小模型替代仍能保持生成质量。然而这些方法仅着眼于节省某些时间步的计算量，忽略了单个时间步内计算需求的差异性。本研究提出HybridStitch这一全新T2I生成范式，将生成过程视作编辑处理。具体而言，我们引入了融合大型模型与小型模型的混合阶段。该方案将完整图像划分为两个区域：相对易于渲染的部分可提前切换至小模型处理，而复杂区域则需要大模型进行精细化修饰。HybridStitch运用小模型构建粗粒度草图，同时利用大模型对复杂区域进行编辑优化。实验评估表明，该方法在Stable Diffusion 3上实现了1.83倍加速，超越现有所有混合模型方法的提速效果。

English

Diffusion models have demonstrated a remarkable ability in Text-to-Image (T2I) generation applications. Despite the advanced generation output, they suffer from heavy computation overhead, especially for large models that contain tens of billions of parameters. Prior work has illustrated that replacing part of the denoising steps with a smaller model still maintains the generation quality. However, these methods only focus on saving computation for some timesteps, ignoring the difference in compute demand within one timestep. In this work, we propose HybridStitch, a new T2I generation paradigm that treats generation like editing. Specifically, we introduce a hybrid stage that jointly incorporates both the large model and the small model. HybridStitch separates the entire image into two regions: one that is relatively easy to render, enabling an early transition to the smaller model, and another that is more complex and therefore requires refinement by the large model. HybridStitch employs the small model to construct a coarse sketch while exploiting the large model to edit and refine the complex regions. According to our evaluation, HybridStitch achieves 1.83times speedup on Stable Diffusion 3, which is faster than all existing mixture of model methods.

HybridStitch：像素与时间步层级模型缝合技术实现扩散模型加速

HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

摘要

Support