MosaicFusion：扩散模型作为大词汇量实例分割的数据增强器

摘要

我们提出了MosaicFusion，这是一种简单而有效的基于扩散的数据增强方法，适用于大词汇量实例分割。我们的方法无需训练，也不依赖任何标签监督。两个关键设计使我们能够利用现成的文本到图像扩散模型作为有用的数据集生成器，用于对象实例和蒙版注释。首先，我们将图像画布分成几个区域，并执行一轮扩散过程，同时根据不同的文本提示生成多个实例。其次，我们通过聚合跨层和扩散时间步骤的与对象提示相关的交叉注意力图，然后进行简单的阈值处理和边缘感知细化处理，获得相应的实例蒙版。我们的MosaicFusion可以为罕见和新颖类别生成大量合成标记数据，没有炫耀的功能，实验结果表明，在具有挑战性的LVIS长尾和开放词汇基准上，MosaicFusion可以显著提高现有实例分割模型的性能，特别是对于罕见和新颖类别。代码将在https://github.com/Jiahao000/MosaicFusion发布。

English

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to employ an off-the-shelf text-to-image diffusion model as a useful dataset generator for object instances and mask annotations. First, we divide an image canvas into several regions and perform a single round of diffusion process to generate multiple instances simultaneously, conditioning on different text prompts. Second, we obtain corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. Without bells and whistles, our MosaicFusion can produce a significant amount of synthetic labeled data for both rare and novel categories. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models, especially for rare and novel categories. Code will be released at https://github.com/Jiahao000/MosaicFusion.

MosaicFusion：扩散模型作为大词汇量实例分割的数据增强器

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

摘要

Support