拡散から選択へ：仮想試着のための潜在拡散モデルにおける画像条件付きインペインティングの強化

要旨

オンラインショッピングが拡大する中、購入者が自分の環境で製品を仮想的に視覚化する能力（我々が「バーチャルトライオール」と定義する現象）が重要になってきている。最近の拡散モデルは本質的に世界モデルを含んでおり、インペインティングの文脈でこのタスクに適している。しかし、従来の画像条件付き拡散モデルは、製品の細部を捉えることができないことが多い。一方、DreamPaintのようなパーソナライゼーション主導のモデルは、アイテムの細部を保持するのに優れているが、リアルタイムアプリケーション向けに最適化されていない。我々は「Diffuse to Choose」を提案する。これは、高速な推論と参照アイテムの高忠実度な細部の保持を効率的にバランスさせつつ、与えられたシーン内容における正確な意味的変換を保証する、新しい拡散ベースの画像条件付きインペインティングモデルである。我々のアプローチは、参照画像の細粒度な特徴をメインの拡散モデルの潜在特徴マップに直接組み込み、参照アイテムの細部をさらに保持するための知覚的損失を併用することに基づいている。社内および公開されているデータセットで広範なテストを行い、Diffuse to Chooseが既存のゼロショット拡散インペインティング手法やDreamPaintのような少数ショット拡散パーソナライゼーションアルゴリズムよりも優れていることを示す。

English

As online shopping is growing, the ability for buyers to virtually visualize products in their settings-a phenomenon we define as "Virtual Try-All"-has become crucial. Recent diffusion models inherently contain a world model, rendering them suitable for this task within an inpainting context. However, traditional image-conditioned diffusion models often fail to capture the fine-grained details of products. In contrast, personalization-driven models such as DreamPaint are good at preserving the item's details but they are not optimized for real-time applications. We present "Diffuse to Choose," a novel diffusion-based image-conditioned inpainting model that efficiently balances fast inference with the retention of high-fidelity details in a given reference item while ensuring accurate semantic manipulations in the given scene content. Our approach is based on incorporating fine-grained features from the reference image directly into the latent feature maps of the main diffusion model, alongside with a perceptual loss to further preserve the reference item's details. We conduct extensive testing on both in-house and publicly available datasets, and show that Diffuse to Choose is superior to existing zero-shot diffusion inpainting methods as well as few-shot diffusion personalization algorithms like DreamPaint.

拡散から選択へ：仮想試着のための潜在拡散モデルにおける画像条件付きインペインティングの強化

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

要旨

Support