ChatPaper.aiChatPaper

MoA:混合注意力用於個性化圖像生成中的主題-上下文分離

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

April 17, 2024
作者: Kuan-Chieh, Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman
cs.AI

摘要

我們提出了一種新的架構,用於個性化文本到圖像擴散模型,稱為注意力混合(MoA)。受大型語言模型(LLMs)中使用的專家混合機制的啟發,MoA在兩個注意力路徑之間分配生成工作負載:個性化分支和非個性化先驗分支。MoA旨在通過固定先驗分支中的注意力層來保留原始模型的先驗,同時通過學習將主題嵌入先驗分支生成的佈局和上下文的個性化分支,最小干預生成過程。一種新穎的路由機制管理每個層中像素在這些分支之間的分佈,以優化個性化和通用內容創建的融合。經過訓練後,MoA促進了創建高質量、個性化圖像,展示了多個主題的構圖和互動,這些構圖和互動與原始模型生成的一樣多樣。至關重要的是,MoA增強了模型現有能力與新增的個性化干預之間的區別,從而提供了一種更具分離主題-上下文控制的方法,這是以前無法實現的。項目頁面:https://snap-research.github.io/mixture-of-attention
English
We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model's pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention

Summary

AI-Generated Summary

PDF151December 15, 2024