IMAGDressing-v1: カスタマイズ可能なバーチャルドレッシング

要旨

最新の進展により、潜在拡散モデルを用いた局所的な衣服修復を通じて、現実的なバーチャル試着（VTON）が実現され、消費者にとってのオンラインショッピング体験が大幅に向上しました。しかし、既存のVTON技術は、衣服の柔軟な制御、オプションの顔、ポーズ、シーンを含む、衣服を包括的に展示するという販売者のニーズを無視しています。この問題に対処するため、私たちは固定された衣服とオプションの条件を持つ自由に編集可能な人間の画像を生成することに焦点を当てたバーチャルドレッシング（VD）タスクを定義します。同時に、生成された画像と参照衣服の一貫性を評価するための包括的な親和性指標（CAMI）を設計します。次に、CLIPから意味的特徴を、VAEからテクスチャ特徴を取得する衣服UNetを組み込んだIMAGDressing-v1を提案します。凍結された自己注意と訓練可能な相互注意を含むハイブリッド注意モジュールを提示し、衣服UNetから得た衣服特徴を凍結されたノイズ除去UNetに統合し、ユーザーがテキストを通じて異なるシーンを制御できるようにします。IMAGDressing-v1は、ControlNetやIP-Adapterなどの拡張プラグインと組み合わせることで、生成画像の多様性と制御性を向上させることができます。さらに、データ不足に対処するため、30万組以上の衣服と着用画像を含むインタラクティブな衣服ペアリング（IGPair）データセットを公開し、データアセンブリの標準パイプラインを確立します。広範な実験により、IMAGDressing-v1が様々な制御条件下で最先端の人間画像合成性能を達成することが示されています。コードとモデルはhttps://github.com/muzishen/IMAGDressingで公開されます。

English

Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at https://github.com/muzishen/IMAGDressing.

IMAGDressing-v1: カスタマイズ可能なバーチャルドレッシング

IMAGDressing-v1: Customizable Virtual Dressing

要旨

Support