Click2Mask:使用动态掩模生成的局部编辑
Click2Mask: Local Editing with Dynamic Mask Generation
September 12, 2024
作者: Omer Regev, Omri Avrahami, Dani Lischinski
cs.AI
摘要
最近生成模型的进展彻底改变了图像生成和编辑,使这些任务对非专家也变得可行。本文着重于局部图像编辑,特别是向模糊指定区域添加新内容的任务。现有方法通常需要精确的蒙版或详细描述位置,这可能繁琐且容易出错。我们提出Click2Mask,一种新颖方法,通过仅需要单个参考点(除了内容描述)简化局部编辑过程。在 Blended Latent Diffusion(BLD)过程中,通过基于 CLIP 的蒙版语义损失引导,围绕该点动态增长蒙版。Click2Mask克服了基于分割和依赖微调的方法的局限,提供了更加用户友好和上下文准确的解决方案。我们的实验表明,Click2Mask不仅减少了用户工作量,而且在人类判断和自动指标方面,与最先进方法相比,提供了竞争力强或更优的局部图像操作结果。关键贡献包括简化用户输入、能够自由添加不受现有分割约束的对象,以及我们动态蒙版方法在其他编辑方法中的整合潜力。
English
Recent advancements in generative models have revolutionized image generation
and editing, making these tasks accessible to non-experts. This paper focuses
on local image editing, particularly the task of adding new content to a
loosely specified area. Existing methods often require a precise mask or a
detailed description of the location, which can be cumbersome and prone to
errors. We propose Click2Mask, a novel approach that simplifies the local
editing process by requiring only a single point of reference (in addition to
the content description). A mask is dynamically grown around this point during
a Blended Latent Diffusion (BLD) process, guided by a masked CLIP-based
semantic loss. Click2Mask surpasses the limitations of segmentation-based and
fine-tuning dependent methods, offering a more user-friendly and contextually
accurate solution. Our experiments demonstrate that Click2Mask not only
minimizes user effort but also delivers competitive or superior local image
manipulation results compared to SoTA methods, according to both human
judgement and automatic metrics. Key contributions include the simplification
of user input, the ability to freely add objects unconstrained by existing
segments, and the integration potential of our dynamic mask approach within
other editing methods.Summary
AI-Generated Summary