嵌套注意力:語義感知注意力值用於概念個性化
Nested Attention: Semantic-aware Attention Values for Concept Personalization
January 2, 2025
作者: Or Patashnik, Rinon Gal, Daniil Ostashev, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or
cs.AI
摘要
將文本轉換為圖像模型以生成特定主題的圖像,跨不同場景和風格的研究正快速發展。目前的方法常常在保持身份保留和與輸入文本提示對齊之間取得平衡時面臨挑戰。一些方法依賴單個文本標記來代表主題,這限制了表達能力,而其他方法則使用更豐富的表示,但破壞了模型的先前設置,降低了提示對齊性。在這項工作中,我們引入了嵌套注意力(Nested Attention),這是一種新穎的機制,將豐富且具表達力的圖像表示注入到模型現有的交叉注意力層中。我們的關鍵想法是生成查詢相關的主題值,這些值來自於學習為生成的圖像中的每個區域選擇相關主題特徵的嵌套注意力層。我們將這些嵌套層整合到基於編碼器的個性化方法中,並展示它們能夠實現高度的身份保留,同時遵循輸入文本提示。我們的方法是通用的,可以在各種領域進行訓練。此外,它的先前保留性使我們能夠將來自不同領域的多個個性化主題結合在單張圖像中。
English
Personalizing text-to-image models to generate images of specific subjects
across diverse scenes and styles is a rapidly advancing field. Current
approaches often face challenges in maintaining a balance between identity
preservation and alignment with the input text prompt. Some methods rely on a
single textual token to represent a subject, which limits expressiveness, while
others employ richer representations but disrupt the model's prior, diminishing
prompt alignment. In this work, we introduce Nested Attention, a novel
mechanism that injects a rich and expressive image representation into the
model's existing cross-attention layers. Our key idea is to generate
query-dependent subject values, derived from nested attention layers that learn
to select relevant subject features for each region in the generated image. We
integrate these nested layers into an encoder-based personalization method, and
show that they enable high identity preservation while adhering to input text
prompts. Our approach is general and can be trained on various domains.
Additionally, its prior preservation allows us to combine multiple personalized
subjects from different domains in a single image.Summary
AI-Generated Summary