ChatPaper.aiChatPaper

CoRe:上下文规范化文本嵌入学习用于文本到图像个性化

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

August 28, 2024
作者: Feize Wu, Yun Pang, Junyi Zhang, Lianyu Pang, Jian Yin, Baoquan Zhao, Qing Li, Xudong Mao
cs.AI

摘要

最近在文本到图像个性化方面取得了重大进展,实现了为用户提供的概念进行高质量且可控的图像合成。然而,现有方法仍然在平衡身份保留和文本对齐方面存在困难。我们的方法基于这样一个事实,即生成与提示对齐的图像需要对提示进行精确的语义理解,这涉及准确处理 CLIP 文本编码器中新概念与其周围上下文标记之间的交互。为了解决这个问题,我们旨在将新概念正确嵌入到文本编码器的输入嵌入空间中,从而实现与现有标记的无缝集成。我们引入了上下文正则化(CoRe),通过规范化提示中的上下文标记来增强新概念文本嵌入的学习。这基于这样一个洞察,即只有当新概念的文本嵌入被正确学习时,才能实现提示中上下文标记的文本编码器的适当输出向量。CoRe 可以应用于任意提示,而无需生成相应的图像,从而提高了学习文本嵌入的泛化能力。此外,CoRe 可作为一种测试时优化技术,进一步增强特定提示的生成。全面的实验表明,我们的方法在身份保留和文本对齐方面优于几种基线方法。代码将公开提供。
English
Recent advances in text-to-image personalization have enabled high-quality and controllable image synthesis for user-provided concepts. However, existing methods still struggle to balance identity preservation with text alignment. Our approach is based on the fact that generating prompt-aligned images requires a precise semantic understanding of the prompt, which involves accurately processing the interactions between the new concept and its surrounding context tokens within the CLIP text encoder. To address this, we aim to embed the new concept properly into the input embedding space of the text encoder, allowing for seamless integration with existing tokens. We introduce Context Regularization (CoRe), which enhances the learning of the new concept's text embedding by regularizing its context tokens in the prompt. This is based on the insight that appropriate output vectors of the text encoder for the context tokens can only be achieved if the new concept's text embedding is correctly learned. CoRe can be applied to arbitrary prompts without requiring the generation of corresponding images, thus improving the generalization of the learned text embedding. Additionally, CoRe can serve as a test-time optimization technique to further enhance the generations for specific prompts. Comprehensive experiments demonstrate that our method outperforms several baseline methods in both identity preservation and text alignment. Code will be made publicly available.

Summary

AI-Generated Summary

PDF257November 16, 2024