ChatPaper.aiChatPaper

CoRe:上下文規範化文本嵌入學習用於文本到圖像個性化

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

August 28, 2024
作者: Feize Wu, Yun Pang, Junyi Zhang, Lianyu Pang, Jian Yin, Baoquan Zhao, Qing Li, Xudong Mao
cs.AI

摘要

最近在文本到圖像個性化方面取得的進展已經實現了高質量且可控的圖像合成,以符合用戶提供的概念。然而,現有方法仍然難以在保持身份保留和文本對齊之間取得平衡。我們的方法基於一個事實,即生成與提示對齊的圖像需要對提示進行精確的語義理解,這涉及準確處理在CLIP文本編碼器中新概念與其周圍上下文標記之間的交互作用。為了應對這一問題,我們的目標是將新概念嵌入到文本編碼器的輸入嵌入空間中,從而實現與現有標記的無縫集成。我們引入了上下文正則化(CoRe),通過對提示中的上下文標記進行正則化,增強了新概念文本嵌入的學習。這是基於一個洞察,即只有當正確學習了新概念的文本嵌入時,才能實現文本編碼器對上下文標記的適當輸出向量。CoRe可以應用於任意提示,而無需生成相應的圖像,從而提高了學習文本嵌入的泛化能力。此外,CoRe還可以作為一種測試時間優化技術,進一步增強特定提示的生成效果。全面的實驗表明,我們的方法在身份保留和文本對齊方面優於幾種基線方法。代碼將公開提供。
English
Recent advances in text-to-image personalization have enabled high-quality and controllable image synthesis for user-provided concepts. However, existing methods still struggle to balance identity preservation with text alignment. Our approach is based on the fact that generating prompt-aligned images requires a precise semantic understanding of the prompt, which involves accurately processing the interactions between the new concept and its surrounding context tokens within the CLIP text encoder. To address this, we aim to embed the new concept properly into the input embedding space of the text encoder, allowing for seamless integration with existing tokens. We introduce Context Regularization (CoRe), which enhances the learning of the new concept's text embedding by regularizing its context tokens in the prompt. This is based on the insight that appropriate output vectors of the text encoder for the context tokens can only be achieved if the new concept's text embedding is correctly learned. CoRe can be applied to arbitrary prompts without requiring the generation of corresponding images, thus improving the generalization of the learned text embedding. Additionally, CoRe can serve as a test-time optimization technique to further enhance the generations for specific prompts. Comprehensive experiments demonstrate that our method outperforms several baseline methods in both identity preservation and text alignment. Code will be made publicly available.

Summary

AI-Generated Summary

PDF257November 16, 2024