並非所有提示都相同：基於提示的修剪對文本到圖像擴散模型的影響

摘要

文字到圖像（T2I）擴散模型展現了令人印象深刻的圖像生成能力。然而，由於其計算密集度，資源受限的組織在將T2I模型在內部目標數據上進行微調後部署時會受到限制。儘管剪枝技術提供了減少T2I模型計算負擔的潛在解決方案，但靜態剪枝方法對所有輸入提示使用相同的剪枝模型，忽略了不同提示的不同容量需求。動態剪枝通過為每個提示使用單獨的子網絡來解決此問題，但會阻礙GPU上的批量並行處理。為了克服這些限制，我們引入了適應性提示定制剪枝（APTP），這是一種針對T2I擴散模型設計的新型基於提示的剪枝方法。我們方法的核心是提示路由模型，該模型學習確定輸入文本提示所需的容量，並將其路由到一個架構代碼，考慮到提示的總計算預算。每個架構代碼代表針對其分配的提示量身定制的專門模型，代碼的數量是一個超參數。我們使用對比學習來訓練提示路由器和架構代碼，確保相似的提示被映射到附近的代碼。此外，我們採用最優運輸方法來防止代碼坍縮為單一代碼。我們通過使用CC3M和COCO作為目標數據集來剪枝穩定擴散（SD）V2.1，展示了APTP的有效性。在FID、CLIP和CMMD分數方面，APTP優於單模型剪枝基線。我們對APTP學習的聚類進行的分析表明，這些聚類在語義上是有意義的。我們還展示了APTP可以自動發現先前在SD中發現的具有挑戰性的提示，例如用於生成文本圖像的提示，將它們分配給容量較高的代碼。

English

Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.

並非所有提示都相同：基於提示的修剪對文本到圖像擴散模型的影響

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

摘要

Support