擴散選擇:在潛在擴散模型中豐富圖像條件修補,用於虛擬試穿。
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
January 24, 2024
作者: Mehmet Saygin Seyfioglu, Karim Bouyarmane, Suren Kumar, Amir Tavanaei, Ismail B. Tutar
cs.AI
摘要
隨著網上購物的增長,買家能夠在其環境中虛擬視覺化產品的能力,我們定義為「虛擬嘗試所有」,變得至關重要。最近的擴散模型本質上包含一個世界模型,使它們適用於在修補上下文中執行此任務。然而,傳統的圖像條件擴散模型通常無法捕捉產品的細節。相反,以個性化驅動的模型,如DreamPaint,在保留物品細節方面表現良好,但它們並未針對實時應用進行優化。我們提出了「擴散選擇」,這是一種新穎的基於擴散的圖像條件修補模型,它有效地平衡了快速推斷與在給定參考項目中保留高保真細節的能力,同時確保在給定場景內容中進行準確的語義操作。我們的方法是基於將參考圖像的細節特徵直接合併到主擴散模型的潛在特徵圖中,並採用感知損失進一步保留參考項目的細節。我們對內部和公開可用數據集進行了廣泛測試,並展示了「擴散選擇」優於現有的零樣本擴散修補方法,以及像DreamPaint這樣的少樣本擴散個性化算法。
English
As online shopping is growing, the ability for buyers to virtually visualize
products in their settings-a phenomenon we define as "Virtual Try-All"-has
become crucial. Recent diffusion models inherently contain a world model,
rendering them suitable for this task within an inpainting context. However,
traditional image-conditioned diffusion models often fail to capture the
fine-grained details of products. In contrast, personalization-driven models
such as DreamPaint are good at preserving the item's details but they are not
optimized for real-time applications. We present "Diffuse to Choose," a novel
diffusion-based image-conditioned inpainting model that efficiently balances
fast inference with the retention of high-fidelity details in a given reference
item while ensuring accurate semantic manipulations in the given scene content.
Our approach is based on incorporating fine-grained features from the reference
image directly into the latent feature maps of the main diffusion model,
alongside with a perceptual loss to further preserve the reference item's
details. We conduct extensive testing on both in-house and publicly available
datasets, and show that Diffuse to Choose is superior to existing zero-shot
diffusion inpainting methods as well as few-shot diffusion personalization
algorithms like DreamPaint.