笔尖:扩散模型的自动适配器选择
Stylus: Automatic Adapter Selection for Diffusion Models
April 29, 2024
作者: Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica
cs.AI
摘要
除了通过更多数据或参数来扩展基础模型之外,微调适配器提供了一种替代方法,可以以较低成本生成高保真度、定制化的图像。因此,适配器已被开源社区广泛采用,积累了超过10万个适配器的数据库,其中大多数都高度定制化,但缺乏充分的描述。本文探讨了将提示与一组相关适配器进行匹配的问题,基于最近强调组合适配器性能增益的工作。我们引入了Stylus,它可以根据提示的关键词高效选择并自动组合特定任务的适配器。Stylus概述了一个三阶段方法,首先通过改进描述和嵌入来总结适配器,检索相关适配器,然后根据提示的关键词进一步组装适配器,通过检查它们与提示的匹配程度来确定。为了评估Stylus,我们开发了StylusDocs,这是一个精心策划的数据集,包含了预先计算的适配器嵌入,共有7.5万个适配器。在我们对流行的Stable Diffusion检查点进行评估时,Stylus实现了更高的CLIP-FID Pareto效率,并且在人类和多模态模型作为评估者时,比基础模型更受欢迎。更多信息请访问stylus-diffusion.github.io。
English
Beyond scaling base models with more data or parameters, fine-tuned adapters
provide an alternative way to generate high fidelity, custom images at reduced
costs. As such, adapters have been widely adopted by open-source communities,
accumulating a database of over 100K adapters-most of which are highly
customized with insufficient descriptions. This paper explores the problem of
matching the prompt to a set of relevant adapters, built on recent work that
highlight the performance gains of composing adapters. We introduce Stylus,
which efficiently selects and automatically composes task-specific adapters
based on a prompt's keywords. Stylus outlines a three-stage approach that first
summarizes adapters with improved descriptions and embeddings, retrieves
relevant adapters, and then further assembles adapters based on prompts'
keywords by checking how well they fit the prompt. To evaluate Stylus, we
developed StylusDocs, a curated dataset featuring 75K adapters with
pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion
checkpoints, Stylus achieves greater CLIP-FID Pareto efficiency and is twice as
preferred, with humans and multimodal models as evaluators, over the base
model. See stylus-diffusion.github.io for more.Summary
AI-Generated Summary