选择性扩散:在潜在扩散模型中丰富图像条件修复,用于虚拟尝试全部。
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
January 24, 2024
作者: Mehmet Saygin Seyfioglu, Karim Bouyarmane, Suren Kumar, Amir Tavanaei, Ismail B. Tutar
cs.AI
摘要
随着在线购物的增长,买家在其环境中虚拟可视化产品的能力——我们定义为“虚拟尝试所有”的现象已变得至关重要。最近的扩散模型固有地包含一个世界模型,使它们适用于在修复上下文中执行此任务。然而,传统的图像条件扩散模型经常无法捕捉产品的细粒度细节。相比之下,以个性化驱动的模型如DreamPaint 擅长保留物品的细节,但它们并未针对实时应用进行优化。我们提出了“扩散选择”,这是一种新颖的基于扩散的图像条件修复模型,能够有效地在给定参考物品中平衡快速推理与高保真细节的保留,同时确保在给定场景内容中进行准确的语义操作。我们的方法是基于直接将参考图像的细粒度特征合并到主扩散模型的潜在特征图中,同时结合感知损失以进一步保留参考物品的细节。我们在内部和公开可用的数据集上进行了广泛测试,并展示了“扩散选择”优于现有的零样本扩散修复方法以及像DreamPaint 这样的少样本扩散个性化算法。
English
As online shopping is growing, the ability for buyers to virtually visualize
products in their settings-a phenomenon we define as "Virtual Try-All"-has
become crucial. Recent diffusion models inherently contain a world model,
rendering them suitable for this task within an inpainting context. However,
traditional image-conditioned diffusion models often fail to capture the
fine-grained details of products. In contrast, personalization-driven models
such as DreamPaint are good at preserving the item's details but they are not
optimized for real-time applications. We present "Diffuse to Choose," a novel
diffusion-based image-conditioned inpainting model that efficiently balances
fast inference with the retention of high-fidelity details in a given reference
item while ensuring accurate semantic manipulations in the given scene content.
Our approach is based on incorporating fine-grained features from the reference
image directly into the latent feature maps of the main diffusion model,
alongside with a perceptual loss to further preserve the reference item's
details. We conduct extensive testing on both in-house and publicly available
datasets, and show that Diffuse to Choose is superior to existing zero-shot
diffusion inpainting methods as well as few-shot diffusion personalization
algorithms like DreamPaint.