筆跡:擴散模型的自動適配器選擇
Stylus: Automatic Adapter Selection for Diffusion Models
April 29, 2024
作者: Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica
cs.AI
摘要
除了透過更多數據或參數來擴展基本模型之外,微調適配器提供了一種替代方式,可以以較低成本生成高保真度的自定義圖像。因此,適配器已被開源社區廣泛採用,累積了超過10萬個適配器的數據庫,其中大部分高度定制,但缺乏充分的描述。本文探討了將提示與一組相關適配器匹配的問題,基於最近強調合成適配器性能增益的工作。我們介紹了Stylus,它可以根據提示的關鍵字高效地選擇並自動組合任務特定的適配器。Stylus概述了一個三階段方法,首先通過改進描述和嵌入來總結適配器,檢索相關適配器,然後根據提示的關鍵字進一步組合適配器,檢查它們與提示的匹配程度。為了評估Stylus,我們開發了StylusDocs,這是一個精心策劃的數據集,其中包含了75,000個具有預先計算的適配器嵌入的適配器。在對流行的Stable Diffusion檢查點進行評估時,Stylus實現了更高的CLIP-FID Pareto效率,並且在人類和多模態模型作為評估者時,比基本模型更受歡迎。有關更多信息,請參見stylus-diffusion.github.io。
English
Beyond scaling base models with more data or parameters, fine-tuned adapters
provide an alternative way to generate high fidelity, custom images at reduced
costs. As such, adapters have been widely adopted by open-source communities,
accumulating a database of over 100K adapters-most of which are highly
customized with insufficient descriptions. This paper explores the problem of
matching the prompt to a set of relevant adapters, built on recent work that
highlight the performance gains of composing adapters. We introduce Stylus,
which efficiently selects and automatically composes task-specific adapters
based on a prompt's keywords. Stylus outlines a three-stage approach that first
summarizes adapters with improved descriptions and embeddings, retrieves
relevant adapters, and then further assembles adapters based on prompts'
keywords by checking how well they fit the prompt. To evaluate Stylus, we
developed StylusDocs, a curated dataset featuring 75K adapters with
pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion
checkpoints, Stylus achieves greater CLIP-FID Pareto efficiency and is twice as
preferred, with humans and multimodal models as evaluators, over the base
model. See stylus-diffusion.github.io for more.Summary
AI-Generated Summary