混合拼接：像素与时间步层级模型拼接技术助力扩散模型加速

摘要

扩散模型在文本到图像生成应用中展现出卓越能力。尽管生成效果先进，但其计算开销巨大，尤其对于参数规模达数百亿的大型模型。已有研究表明，在部分去噪步骤中使用较小模型替代仍可保持生成质量。然而这些方法仅着眼于节省某些时间步的计算量，忽略了单个时间步内计算需求的差异性。本研究提出HybridStitch——一种将生成过程视作编辑任务的新型T2I生成范式。具体而言，我们引入融合大模型与小模型的混合阶段：HybridStitch将完整图像划分为两个区域，相对易于渲染的区域可提前过渡至小模型处理，而复杂区域则继续由大模型精细化修饰。该方案利用小模型构建粗粒度草图，同时发挥大模型在复杂区域的编辑优化能力。实验表明，HybridStitch在Stable Diffusion 3上实现1.83倍加速效果，超越现有所有混合模型方法。

English

Diffusion models have demonstrated a remarkable ability in Text-to-Image (T2I) generation applications. Despite the advanced generation output, they suffer from heavy computation overhead, especially for large models that contain tens of billions of parameters. Prior work has illustrated that replacing part of the denoising steps with a smaller model still maintains the generation quality. However, these methods only focus on saving computation for some timesteps, ignoring the difference in compute demand within one timestep. In this work, we propose HybridStitch, a new T2I generation paradigm that treats generation like editing. Specifically, we introduce a hybrid stage that jointly incorporates both the large model and the small model. HybridStitch separates the entire image into two regions: one that is relatively easy to render, enabling an early transition to the smaller model, and another that is more complex and therefore requires refinement by the large model. HybridStitch employs the small model to construct a coarse sketch while exploiting the large model to edit and refine the complex regions. According to our evaluation, HybridStitch achieves 1.83times speedup on Stable Diffusion 3, which is faster than all existing mixture of model methods.

混合拼接：像素与时间步层级模型拼接技术助力扩散模型加速

HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

摘要

Support