DreamTuner:单图即可实现主题驱动生成
DreamTuner: Single Image is Enough for Subject-Driven Generation
December 21, 2023
作者: Miao Hua, Jiawei Liu, Fei Ding, Wei Liu, Jie Wu, Qian He
cs.AI
摘要
基于扩散的模型展示了在文本到图像生成方面的出色能力,并且被期望用于个性化主题驱动生成的应用,这些应用需要生成具有定制概念的图像,只需一两个参考图像。然而,基于微调的现有方法未能平衡主题学习和维护预训练模型生成能力之间的权衡。此外,利用额外图像编码器的其他方法往往由于编码压缩而丢失主题的重要细节。为了解决这些挑战,我们提出了DreamTurner,这是一种新颖的方法,通过从粗到细注入参考信息,更有效地实现主题驱动的图像生成。DreamTurner引入了一个主题编码器,用于粗略主题身份保留,通过注意力层在视觉文本交叉注意力之前引入压缩的一般主题特征。然后,我们修改了预训练文本到图像模型中的自注意力层,将其改为自主题注意力层,以细化目标主题的细节。生成的图像通过自主题注意力从参考图像和自身中查询详细特征。值得强调的是,自主题注意力是一种有效、优雅且无需训练的方法,用于保持定制主题的详细特征,并且可以在推理过程中作为即插即用的解决方案。最后,通过额外的主题驱动微调,DreamTurner在主题驱动图像生成方面取得了显著的性能,可以通过文本或其他条件(如姿势)进行控制。欲了解更多详情,请访问项目页面https://dreamtuner-diffusion.github.io/。
English
Diffusion-based models have demonstrated impressive capabilities for
text-to-image generation and are expected for personalized applications of
subject-driven generation, which require the generation of customized concepts
with one or a few reference images. However, existing methods based on
fine-tuning fail to balance the trade-off between subject learning and the
maintenance of the generation capabilities of pretrained models. Moreover,
other methods that utilize additional image encoders tend to lose important
details of the subject due to encoding compression. To address these
challenges, we propose DreamTurner, a novel method that injects reference
information from coarse to fine to achieve subject-driven image generation more
effectively. DreamTurner introduces a subject-encoder for coarse subject
identity preservation, where the compressed general subject features are
introduced through an attention layer before visual-text cross-attention. We
then modify the self-attention layers within pretrained text-to-image models to
self-subject-attention layers to refine the details of the target subject. The
generated image queries detailed features from both the reference image and
itself in self-subject-attention. It is worth emphasizing that
self-subject-attention is an effective, elegant, and training-free method for
maintaining the detailed features of customized subjects and can serve as a
plug-and-play solution during inference. Finally, with additional
subject-driven fine-tuning, DreamTurner achieves remarkable performance in
subject-driven image generation, which can be controlled by a text or other
conditions such as pose. For further details, please visit the project page at
https://dreamtuner-diffusion.github.io/.