StyleAdapter:一种用于风格化图像生成的单通道无局部相关性自适应模型
StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation
September 4, 2023
作者: Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo
cs.AI
摘要
本文提出了一种无需 LoRA 的风格化图像生成方法,该方法以文本提示和风格参考图像作为输入,在一次传递中生成输出图像。与现有方法依赖为每种风格训练单独 LoRA 的方法不同,我们的方法可以通过统一模型适应各种风格。然而,这带来了两个挑战:1)提示失去了对生成内容的可控性,2)输出图像继承了风格参考图像的语义和风格特征,损害了其内容的忠实度。为了解决这些挑战,我们引入了 StyleAdapter,这是一个由两个组件组成的模型:双路径交叉注意力模块(TPCA)和三种解耦策略。这些组件使我们的模型能够分别处理提示和风格参考特征,并减少风格参考中语义和风格信息之间的强耦合。StyleAdapter 能够在一次传递中生成与提示内容匹配并采用参考风格(甚至是未见过的风格)的高质量图像,比先前的方法更灵活和高效。实验证明了我们的方法优于先前的工作。
English
This paper presents a LoRA-free method for stylized image generation that
takes a text prompt and style reference images as inputs and produces an output
image in a single pass. Unlike existing methods that rely on training a
separate LoRA for each style, our method can adapt to various styles with a
unified model. However, this poses two challenges: 1) the prompt loses
controllability over the generated content, and 2) the output image inherits
both the semantic and style features of the style reference image, compromising
its content fidelity. To address these challenges, we introduce StyleAdapter, a
model that comprises two components: a two-path cross-attention module (TPCA)
and three decoupling strategies. These components enable our model to process
the prompt and style reference features separately and reduce the strong
coupling between the semantic and style information in the style references.
StyleAdapter can generate high-quality images that match the content of the
prompts and adopt the style of the references (even for unseen styles) in a
single pass, which is more flexible and efficient than previous methods.
Experiments have been conducted to demonstrate the superiority of our method
over previous works.