RealCustom:對實時開放領域文本進行狹義化以進行文本到圖像定制
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
March 1, 2024
作者: Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang
cs.AI
摘要
文本到圖像定制旨在為給定主題合成文本驅動的圖像,最近已經徹底改變了內容創作。現有作品遵循虛擬詞範式,即將給定主題表示為虛擬詞,然後與給定文本組合。然而,虛擬詞與給定文本之間固有的交織影響範圍導致了雙重最適悖論,即給定主題的相似性和給定文本的可控性無法同時達到最優。我們提出了RealCustom,首次通過精確限制主題影響僅限於相關部分,從而將相似性與可控性區分開來。這是通過逐漸將真實文本詞從其一般內涵逐步縮小到具體主題,並使用其交叉注意力來區分相關性來實現的。具體來說,RealCustom引入了一種新穎的“訓練-推斷”解耦框架:(1)在訓練期間,RealCustom通過一個新穎的自適應評分模塊學習視覺條件與原始文本條件之間的一般對齊,以自適應調節影響量;(2)在推斷期間,提出了一種新穎的自適應遮罩引導策略,用於迭代更新給定主題的影響範圍和影響量,逐步縮小生成真實文本詞。全面的實驗證明了RealCustom在開放領域中具有卓越的實時定制能力,首次實現了給定主題的前所未有的相似性和給定文本的可控性。項目頁面為https://corleone-huang.github.io/realcustom/。
English
Text-to-image customization, which aims to synthesize text-driven images for
the given subjects, has recently revolutionized content creation. Existing
works follow the pseudo-word paradigm, i.e., represent the given subjects as
pseudo-words and then compose them with the given text. However, the inherent
entangled influence scope of pseudo-words with the given text results in a
dual-optimum paradox, i.e., the similarity of the given subjects and the
controllability of the given text could not be optimal simultaneously. We
present RealCustom that, for the first time, disentangles similarity from
controllability by precisely limiting subject influence to relevant parts only,
achieved by gradually narrowing real text word from its general connotation to
the specific subject and using its cross-attention to distinguish relevance.
Specifically, RealCustom introduces a novel "train-inference" decoupled
framework: (1) during training, RealCustom learns general alignment between
visual conditions to original textual conditions by a novel adaptive scoring
module to adaptively modulate influence quantity; (2) during inference, a novel
adaptive mask guidance strategy is proposed to iteratively update the influence
scope and influence quantity of the given subjects to gradually narrow the
generation of the real text word. Comprehensive experiments demonstrate the
superior real-time customization ability of RealCustom in the open domain,
achieving both unprecedented similarity of the given subjects and
controllability of the given text for the first time. The project page is
https://corleone-huang.github.io/realcustom/.