ChatPaper.aiChatPaper

RealCustom:为实时开放域文本到图像定制缩小真实文本词

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

March 1, 2024
作者: Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang
cs.AI

摘要

文本到图像的定制化旨在为给定主题合成文本驱动的图像,最近已经彻底改变了内容创作。现有作品遵循伪词范例,即将给定主题表示为伪词,然后将其与给定文本组合。然而,伪词与给定文本之间固有的纠缠影响范围导致了双重最优悖论,即给定主题的相似性和给定文本的可控性不能同时达到最佳。我们提出 RealCustom,首次通过精确将主题影响限制在相关部分来解开相似性和可控性,通过逐渐将真实文本词从其一般内涵缩小到具体主题,并利用其交叉注意力来区分相关性。具体而言,RealCustom引入了一种新颖的“训练-推理”解耦框架:(1)在训练期间,RealCustom通过一种新颖的自适应评分模块学习视觉条件与原始文本条件之间的一般对齐,以自适应调节影响数量;(2)在推理期间,提出了一种新颖的自适应遮罩引导策略,以迭代更新给定主题的影响范围和影响数量,逐渐缩小生成真实文本词的范围。全面的实验表明 RealCustom 在开放领域具有卓越的实时定制能力,首次实现了给定主题的前所未有的相似性和给定文本的可控性。项目页面为 https://corleone-huang.github.io/realcustom/。
English
Text-to-image customization, which aims to synthesize text-driven images for the given subjects, has recently revolutionized content creation. Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text. However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum paradox, i.e., the similarity of the given subjects and the controllability of the given text could not be optimal simultaneously. We present RealCustom that, for the first time, disentangles similarity from controllability by precisely limiting subject influence to relevant parts only, achieved by gradually narrowing real text word from its general connotation to the specific subject and using its cross-attention to distinguish relevance. Specifically, RealCustom introduces a novel "train-inference" decoupled framework: (1) during training, RealCustom learns general alignment between visual conditions to original textual conditions by a novel adaptive scoring module to adaptively modulate influence quantity; (2) during inference, a novel adaptive mask guidance strategy is proposed to iteratively update the influence scope and influence quantity of the given subjects to gradually narrow the generation of the real text word. Comprehensive experiments demonstrate the superior real-time customization ability of RealCustom in the open domain, achieving both unprecedented similarity of the given subjects and controllability of the given text for the first time. The project page is https://corleone-huang.github.io/realcustom/.
PDF151December 15, 2024