Diffree：使用扩散模型进行文本引导形状自由对象修复

摘要

本文解决了仅凭文本指导对图像进行对象添加的重要问题。这是一个具有挑战性的问题，因为新对象必须与图像无缝集成，保持一致的视觉背景，如光照、纹理和空间位置。虽然现有的文本引导图像修补方法可以添加对象，但它们要么无法保持背景一致性，要么需要繁琐的人工干预来指定边界框或用户涂鸦蒙版。为了解决这一挑战，我们引入了Diffree，一种文本到图像（T2I）模型，可通过仅凭文本控制来促进文本引导的对象添加。为此，我们通过先进的图像修补技术删除对象，精心策划了一个精美的合成数据集OABench。OABench包含74K个真实世界元组，包括原始图像、去除对象后的修补图像、对象蒙版和对象描述。在OABench上使用稳定扩散模型和额外的蒙版预测模块进行训练，Diffree独特地预测新对象的位置，并实现了仅凭文本指导的对象添加。大量实验证明，Diffree在高成功率下添加新对象的同时，保持了背景一致性、空间适当性和对象相关性和质量。

English

This paper addresses an important problem of object addition for images with only text guidance. It is challenging because the new object must be integrated seamlessly into the image with consistent visual context, such as lighting, texture, and spatial location. While existing text-guided image inpainting methods can add objects, they either fail to preserve the background consistency or involve cumbersome human intervention in specifying bounding boxes or user-scribbled masks. To tackle this challenge, we introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control. To this end, we curate OABench, an exquisite synthetic dataset by removing objects with advanced image inpainting techniques. OABench comprises 74K real-world tuples of an original image, an inpainted image with the object removed, an object mask, and object descriptions. Trained on OABench using the Stable Diffusion model with an additional mask prediction module, Diffree uniquely predicts the position of the new object and achieves object addition with guidance from only text. Extensive experiments demonstrate that Diffree excels in adding new objects with a high success rate while maintaining background consistency, spatial appropriateness, and object relevance and quality.

Diffree：使用扩散模型进行文本引导形状自由对象修复

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

摘要

Support