Any2AnyTryon：適応可能な位置エンベディングを活用した多目的仮想衣類タスク

要旨

画像ベースのバーチャル試着（VTON）は、入力された衣類を対象者の画像に転送してバーチャルな試着結果を生成することを目指しています。ただし、衣類とモデルのペアデータの希少性が既存の手法に高い一般化と品質を達成するのを困難にしています。また、マスクなしの試着を生成する能力を制限しています。データの希少性の問題に取り組むために、Stable GarmentやMMTryonなどの手法は、合成データ戦略を使用して、モデル側のペアデータの量を効果的に増やしています。ただし、既存の手法は通常、特定の試着タスクの実行に限定されており、ユーザーフレンドリーさが欠けています。VTON生成の一般化と制御可能性を向上させるために、私たちはAny2AnyTryonを提案しています。これは、さまざまなニーズに応じて、異なるテキスト指示とモデル衣類画像に基づいて試着結果を生成でき、マスク、ポーズ、その他の条件への依存を排除します。具体的には、まず、最大知られているオープンソースの衣類試着データセットであるLAION-Garmentを構築します。次に、適応的位置埋め込みを導入し、モデルが異なるサイズとカテゴリの入力画像に基づいて満足のいく試着済みモデル画像または衣類画像を生成できるようにします。これにより、VTON生成の一般化と制御可能性が大幅に向上します。実験では、Any2AnyTryonの効果を実証し、既存の手法と比較します。その結果、Any2AnyTryonは柔軟で制御可能で高品質な画像ベースのバーチャル試着生成を実現します。

English

Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image. However, the scarcity of paired garment-model data makes it challenging for existing methods to achieve high generalization and quality in VTON. Also, it limits the ability to generate mask-free try-ons. To tackle the data scarcity problem, approaches such as Stable Garment and MMTryon use a synthetic data strategy, effectively increasing the amount of paired data on the model side. However, existing methods are typically limited to performing specific try-on tasks and lack user-friendliness. To enhance the generalization and controllability of VTON generation, we propose Any2AnyTryon, which can generate try-on results based on different textual instructions and model garment images to meet various needs, eliminating the reliance on masks, poses, or other conditions. Specifically, we first construct the virtual try-on dataset LAION-Garment, the largest known open-source garment try-on dataset. Then, we introduce adaptive position embedding, which enables the model to generate satisfactory outfitted model images or garment images based on input images of different sizes and categories, significantly enhancing the generalization and controllability of VTON generation. In our experiments, we demonstrate the effectiveness of our Any2AnyTryon and compare it with existing methods. The results show that Any2AnyTryon enables flexible, controllable, and high-quality image-based virtual try-on generation.https://logn-2024.github.io/Any2anyTryonProjectPage/

Any2AnyTryon：適応可能な位置エンベディングを活用した多目的仮想衣類タスク

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

要旨

Support