IMAGDressing-v1:可定制的虚拟试衣系统
IMAGDressing-v1: Customizable Virtual Dressing
July 17, 2024
作者: Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinghui Tang
cs.AI
摘要
最新的进展通过使用潜在扩散模型进行局部服装修补,实现了逼真的虚拟试穿(VTON),显著提升了消费者的在线购物体验。然而,现有的VTON技术忽视了商家全面展示服装的需求,包括对服装、可选面部、姿势和场景的灵活控制。为解决这一问题,我们定义了一个着重于生成可自由编辑人体图像的虚拟试衣(VD)任务,固定服装并可选条件。同时,我们设计了一个全面的亲和度度量指标(CAMI)来评估生成图像与参考服装之间的一致性。然后,我们提出了IMAGDressing-v1,其中包括一个从CLIP捕获语义特征和从VAE获取纹理特征的服装UNet。我们提出了一个混合注意力模块,包括一个冻结的自注意力和一个可训练的交叉注意力,将服装UNet中的服装特征整合到冻结去噪UNet中,确保用户可以通过文本控制不同场景。IMAGDressing-v1可以与其他扩展插件结合,如ControlNet和IP-Adapter,以增强生成图像的多样性和可控性。此外,为解决数据不足问题,我们发布了交互式服装配对(IGPair)数据集,包含超过30万对服装和着装图像,并建立了数据组装的标准流程。大量实验证明,我们的IMAGDressing-v1在各种受控条件下实现了最先进的人体图像合成性能。代码和模型将在https://github.com/muzishen/IMAGDressing 上提供。
English
Latest advances have achieved realistic virtual try-on (VTON) through
localized garment inpainting using latent diffusion models, significantly
enhancing consumers' online shopping experience. However, existing VTON
technologies neglect the need for merchants to showcase garments
comprehensively, including flexible control over garments, optional faces,
poses, and scenes. To address this issue, we define a virtual dressing (VD)
task focused on generating freely editable human images with fixed garments and
optional conditions. Meanwhile, we design a comprehensive affinity metric index
(CAMI) to evaluate the consistency between generated images and reference
garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet
that captures semantic features from CLIP and texture features from VAE. We
present a hybrid attention module, including a frozen self-attention and a
trainable cross-attention, to integrate garment features from the garment UNet
into a frozen denoising UNet, ensuring users can control different scenes
through text. IMAGDressing-v1 can be combined with other extension plugins,
such as ControlNet and IP-Adapter, to enhance the diversity and controllability
of generated images. Furthermore, to address the lack of data, we release the
interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of
clothing and dressed images, and establish a standard pipeline for data
assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves
state-of-the-art human image synthesis performance under various controlled
conditions. The code and model will be available at
https://github.com/muzishen/IMAGDressing.Summary
AI-Generated Summary