AnyDressing：潜在拡散モデルを用いたカスタマイズ可能なマルチガーメント仮想ドレッシング

要旨

最近の拡散モデルに基づくテキストと画像プロンプトからの衣類中心の画像生成の進歩は印象的です。ただし、既存の手法は様々な衣装の組み合わせをサポートせず、衣服の詳細を保持しながらテキストプロンプトに忠実であることに苦労し、多様なシナリオでのパフォーマンスを制限しています。本論文では、新しいタスクであるマルチガーメント仮想ドレッシングに焦点を当て、任意の衣装の組み合わせと任意の個人用テキストプロンプトに基づいてキャラクターをカスタマイズするための新しいAnyDressing手法を提案します。AnyDressingには、詳細な衣服の特徴を抽出するGarmentsNetとカスタマイズされた画像を生成するDressingNetという2つの主要なネットワークが含まれています。具体的には、GarmentsNet内のGarment-Specific Feature Extractorという効率的でスケーラブルなモジュールを提案し、衣服のテクスチャを個別に並列にエンコードします。この設計により、ネットワークの効率性を確保しつつ、衣服の混乱を防ぎます。一方、DressingNet内のDressing-AttentionメカニズムとInstance-Level Garment Localization Learning戦略を設計し、複数の衣装の特徴を正確に対応する領域に注入します。このアプローチにより、複数の衣装のテクスチャの手がかりを生成された画像に効率的に統合し、さらにテキストと画像の整合性を向上させます。さらに、Garment-Enhanced Texture Learning戦略を導入して、衣服の細かいテクスチャの詳細を向上させます。私たちの精巧な設計のおかげで、AnyDressingは拡散モデルのコミュニティ制御拡張と簡単に統合できるプラグインモジュールとして機能し、合成された画像の多様性と制御可能性を向上させます。幅広い実験により、AnyDressingが最先端の結果を達成していることが示されています。

English

Recent advances in garment-centric image generation from text and image prompts based on diffusion models are impressive. However, existing methods lack support for various combinations of attire, and struggle to preserve the garment details while maintaining faithfulness to the text prompts, limiting their performance across diverse scenarios. In this paper, we focus on a new task, i.e., Multi-Garment Virtual Dressing, and we propose a novel AnyDressing method for customizing characters conditioned on any combination of garments and any personalized text prompts. AnyDressing comprises two primary networks named GarmentsNet and DressingNet, which are respectively dedicated to extracting detailed clothing features and generating customized images. Specifically, we propose an efficient and scalable module called Garment-Specific Feature Extractor in GarmentsNet to individually encode garment textures in parallel. This design prevents garment confusion while ensuring network efficiency. Meanwhile, we design an adaptive Dressing-Attention mechanism and a novel Instance-Level Garment Localization Learning strategy in DressingNet to accurately inject multi-garment features into their corresponding regions. This approach efficiently integrates multi-garment texture cues into generated images and further enhances text-image consistency. Additionally, we introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments. Thanks to our well-craft design, AnyDressing can serve as a plug-in module to easily integrate with any community control extensions for diffusion models, improving the diversity and controllability of synthesized images. Extensive experiments show that AnyDressing achieves state-of-the-art results.

AnyDressing：潜在拡散モデルを用いたカスタマイズ可能なマルチガーメント仮想ドレッシング

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

要旨

Support