MagicTailor: テキストから画像への拡散モデルにおけるコンポーネント制御可能な個人化

要旨

最近のテキストから画像への変換（T2I）拡散モデルの進歩により、テキストプロンプトから高品質な画像を生成することが可能になりましたが、特定の視覚的概念に対する正確な制御がまだ課題となっています。既存のアプローチは、参照画像から学習して特定の概念を複製することができますが、概念内の個々のコンポーネントを細かくカスタマイズする柔軟性に欠けています。本論文では、コンポーネント制御可能な個人化という新しいタスクを紹介し、視覚的概念を個人化する際に特定のコンポーネントを再構成できるようにすることで、T2Iモデルの限界を押し広げます。このタスクは、主に2つの主要な障害により特に難しいです：不要な視覚要素が個人化された概念を汚染する「意味汚染」と、概念とコンポーネントの不均衡により引き起こされる「意味の不均衡」です。これらの課題を克服するために、私たちはMagicTailorという革新的なフレームワークを設計しました。このフレームワークは、望ましくない視覚的意味を動的に乱すDynamic Masked Degradation（DM-Deg）を活用し、望ましい視覚的意味のためのバランスの取れた学習パラダイムを確立するためのDual-Stream Balancing（DS-Bal）を利用しています。包括的な比較、削除、および分析により、MagicTailorがこの難しいタスクで優れているだけでなく、実用的な応用においても大きな可能性を秘めており、より微妙で創造的な画像生成の道を開いています。

English

Recent advancements in text-to-image (T2I) diffusion models have enabled the creation of high-quality images from text prompts, but they still struggle to generate images with precise control over specific visual concepts. Existing approaches can replicate a given concept by learning from reference images, yet they lack the flexibility for fine-grained customization of the individual component within the concept. In this paper, we introduce component-controllable personalization, a novel task that pushes the boundaries of T2I models by allowing users to reconfigure specific components when personalizing visual concepts. This task is particularly challenging due to two primary obstacles: semantic pollution, where unwanted visual elements corrupt the personalized concept, and semantic imbalance, which causes disproportionate learning of the concept and component. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream Balancing (DS-Bal) to establish a balanced learning paradigm for desired visual semantics. Extensive comparisons, ablations, and analyses demonstrate that MagicTailor not only excels in this challenging task but also holds significant promise for practical applications, paving the way for more nuanced and creative image generation.

MagicTailor: テキストから画像への拡散モデルにおけるコンポーネント制御可能な個人化

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

要旨

Support