MoA: 混合注意力用于个性化图像生成中的主题-上下文解缠
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
April 17, 2024
作者: Kuan-Chieh, Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman
cs.AI
摘要
我们引入了一种用于个性化文本到图像扩散模型的新架构,称为注意力混合(MoA)。受到大型语言模型(LLMs)中使用的专家混合机制的启发,MoA在两个注意力路径之间分配生成工作负载:一个个性化分支和一个非个性化先验分支。MoA旨在通过固定先验分支中的注意力层,保留原始模型的先验,同时通过学习将主题嵌入先验分支生成的布局和上下文的个性化分支,最小干预生成过程。一种新颖的路由机制管理每个层中像素在这些分支之间的分布,以优化个性化和通用内容创建的融合。经过训练后,MoA促进了创建高质量、个性化图像,展示了多个主题的构图和互动,与原始模型生成的多样化一样。至关重要的是,MoA增强了模型现有能力与新增个性化干预之间的区别,从而提供了一种更具分离主题-上下文控制的方式,这是以前无法实现的。项目页面:https://snap-research.github.io/mixture-of-attention
English
We introduce a new architecture for personalization of text-to-image
diffusion models, coined Mixture-of-Attention (MoA). Inspired by the
Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA
distributes the generation workload between two attention pathways: a
personalized branch and a non-personalized prior branch. MoA is designed to
retain the original model's prior by fixing its attention layers in the prior
branch, while minimally intervening in the generation process with the
personalized branch that learns to embed subjects in the layout and context
generated by the prior branch. A novel routing mechanism manages the
distribution of pixels in each layer across these branches to optimize the
blend of personalized and generic content creation. Once trained, MoA
facilitates the creation of high-quality, personalized images featuring
multiple subjects with compositions and interactions as diverse as those
generated by the original model. Crucially, MoA enhances the distinction
between the model's pre-existing capability and the newly augmented
personalized intervention, thereby offering a more disentangled subject-context
control that was previously unattainable. Project page:
https://snap-research.github.io/mixture-of-attentionSummary
AI-Generated Summary