MagicTailor:在文本到图像扩散模型中的组件可控个性化
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
October 17, 2024
作者: Donghao Zhou, Jiancheng Huang, Jinbin Bai, Jiaze Wang, Hao Chen, Guangyong Chen, Xiaowei Hu, Pheng-Ann Heng
cs.AI
摘要
最近文本到图像(T2I)扩散模型的进展使得可以从文本提示中创建高质量图像,但仍然难以精确控制特定视觉概念的生成。现有方法可以通过学习参考图像来复制给定概念,但缺乏对概念内个体组件进行精细定制的灵活性。本文介绍了组件可控个性化,这是一个新颖的任务,通过允许用户在个性化视觉概念时重新配置特定组件,推动了T2I模型的边界。这个任务特别具有挑战性,主要有两个障碍:语义污染,即不需要的视觉元素破坏了个性化概念,以及语义不平衡,导致概念和组件的学习不成比例。为了克服这些挑战,我们设计了MagicTailor,这是一个创新框架,利用动态遮罩退化(DM-Deg)动态扰动不需要的视觉语义,以及双流平衡(DS-Bal)建立了一个平衡的学习范式,用于所需的视觉语义。广泛的比较、消融和分析表明,MagicTailor不仅在这一具有挑战性的任务中表现出色,而且在实际应用中具有重要的潜力,为更加细致和创造性的图像生成铺平了道路。
English
Recent advancements in text-to-image (T2I) diffusion models have enabled the
creation of high-quality images from text prompts, but they still struggle to
generate images with precise control over specific visual concepts. Existing
approaches can replicate a given concept by learning from reference images, yet
they lack the flexibility for fine-grained customization of the individual
component within the concept. In this paper, we introduce
component-controllable personalization, a novel task that pushes the boundaries
of T2I models by allowing users to reconfigure specific components when
personalizing visual concepts. This task is particularly challenging due to two
primary obstacles: semantic pollution, where unwanted visual elements corrupt
the personalized concept, and semantic imbalance, which causes disproportionate
learning of the concept and component. To overcome these challenges, we design
MagicTailor, an innovative framework that leverages Dynamic Masked Degradation
(DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream
Balancing (DS-Bal) to establish a balanced learning paradigm for desired visual
semantics. Extensive comparisons, ablations, and analyses demonstrate that
MagicTailor not only excels in this challenging task but also holds significant
promise for practical applications, paving the way for more nuanced and
creative image generation.Summary
AI-Generated Summary