并非所有提示都是相同的：基于提示的文本到图像扩散模型修剪

摘要

文本到图像（T2I）扩散模型展示了令人印象深刻的图像生成能力。然而，它们的计算强度阻碍了资源受限的组织在对内部目标数据进行微调后部署T2I模型。虽然剪枝技术提供了减少T2I模型计算负担的潜在解决方案，但静态剪枝方法对所有输入提示使用相同的剪枝模型，忽视了不同提示的不同容量需求。动态剪枝通过为每个提示使用单独的子网络来解决这个问题，但它阻止了GPU上的批处理并行性。为了克服这些限制，我们引入了自适应提示定制剪枝（APTP），这是一种专为T2I扩散模型设计的新型基于提示的剪枝方法。我们方法的核心是一个提示路由模型，它学习确定输入文本提示所需的容量，并将其路由到一个架构代码，给定提示的总计算预算。每个架构代码代表一个针对分配给它的提示量身定制的专门模型，代码数量是一个超参数。我们使用对比学习训练提示路由器和架构代码，确保相似的提示被映射到附近的代码。此外，我们采用最优传输来防止代码坍缩为单一代码。我们通过使用CC3M和COCO作为目标数据集对Stable Diffusion（SD）V2.1进行剪枝来展示APTP的有效性。APTP在FID、CLIP和CMMD分数方面优于单模型剪枝基线。我们对APTP学习的聚类进行的分析表明，它们在语义上是有意义的。我们还展示APTP可以自动发现先前经验发现的对SD具有挑战性的提示，例如用于生成文本图像的提示，将它们分配给更高容量的代码。

English

Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.

并非所有提示都是相同的：基于提示的文本到图像扩散模型修剪

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

摘要

Support