모든 프롬프트가 동등하지 않다: 텍스트-이미지 확산 모델의 프롬프트 기반 가지치기

초록

텍스트-이미지(T2I) 확산 모델은 인상적인 이미지 생성 능력을 보여주고 있습니다. 그러나 이러한 모델의 높은 계산 복잡성으로 인해 자원이 제한된 조직들은 내부 목표 데이터에 대해 미세 조정을 거친 T2I 모델을 배포하는 데 어려움을 겪고 있습니다. 프루닝(pruning) 기법은 T2I 모델의 계산 부담을 줄이는 잠재적인 해결책을 제공하지만, 정적 프루닝 방법은 모든 입력 프롬프트에 대해 동일한 프루닝된 모델을 사용함으로써 서로 다른 프롬프트의 다양한 용량 요구를 간과합니다. 동적 프루닝은 각 프롬프트에 대해 별도의 서브 네트워크를 활용하여 이 문제를 해결하지만, GPU에서의 배치 병렬 처리를 방해합니다. 이러한 한계를 극복하기 위해, 우리는 T2I 확산 모델을 위해 설계된 새로운 프롬프트 기반 프루닝 방법인 적응형 프롬프트 맞춤형 프루닝(Adaptive Prompt-Tailored Pruning, APTP)을 소개합니다. 우리의 접근 방식의 핵심은 프롬프트 라우터 모델로, 이 모델은 입력 텍스트 프롬프트에 필요한 용량을 결정하고, 주어진 총 계산 예산 내에서 이를 아키텍처 코드로 라우팅합니다. 각 아키텍처 코드는 해당 코드에 할당된 프롬프트에 맞게 특화된 모델을 나타내며, 코드의 수는 하이퍼파라미터입니다. 우리는 프롬프트 라우터와 아키텍처 코드를 대조 학습(contrastive learning)을 통해 훈련시켜, 유사한 프롬프트가 가까운 코드에 매핑되도록 합니다. 또한, 최적 운송(optimal transport)을 사용하여 코드가 단일 코드로 수렴되는 것을 방지합니다. 우리는 CC3M과 COCO를 대상 데이터셋으로 사용하여 Stable Diffusion(SD) V2.1을 프루닝함으로써 APTP의 효과를 입증합니다. APTP는 FID, CLIP, CMMD 점수 측면에서 단일 모델 프루닝 기준선을 능가합니다. APTP가 학습한 클러스터를 분석한 결과, 이들이 의미론적으로 의미 있는 것을 확인했습니다. 또한, APTP가 SD에 대해 이전에 경험적으로 발견된 어려운 프롬프트(예: 텍스트 이미지 생성 프롬프트)를 자동으로 발견하고, 이를 더 높은 용량의 코드에 할당할 수 있음을 보여줍니다.

English

Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.

모든 프롬프트가 동등하지 않다: 텍스트-이미지 확산 모델의 프롬프트 기반 가지치기

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

초록

Support