InstaGen：通过在合成数据集上训练来增强物体检测

摘要

本文介绍了一种新的范式，通过在从扩散模型生成的合成数据集上训练，来增强目标检测器的能力，例如扩展类别或提高检测性能。具体来说，我们将一个实例级别的定位头整合到一个预训练的生成式扩散模型中，以赋予其在生成的图像中定位任意实例的能力。定位头被训练来将类别名称的文本嵌入与扩散模型的区域视觉特征对齐，利用来自现成目标检测器的监督以及一种新颖的自我训练方案，用于（新颖的）检测器未覆盖的类别。这种增强版的扩散模型被称为InstaGen，可以作为目标检测的数据合成器。我们进行了彻底的实验，表明目标检测器在从InstaGen的合成数据集上训练时可以得到增强，表现出优越的性能，超过现有的开放词汇（+4.5 AP）和数据稀疏（+1.2至5.2 AP）场景中的最先进方法。

English

In this paper, we introduce a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising arbitrary instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. This enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer for object detection. We conduct thorough experiments to show that, object detector can be enhanced while training on the synthetic dataset from InstaGen, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios.

InstaGen：通过在合成数据集上训练来增强物体检测

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

摘要

Support