InstaGen：通過在合成數據集上訓練來增強物體檢測

摘要

本文介紹了一種新的範式，通過在由擴散模型生成的合成數據集上進行訓練，來增強對象檢測器的能力，例如擴展類別或提高檢測性能。具體來說，我們將一個實例級定位頭部整合到預先訓練的生成式擴散模型中，以賦予其在生成的圖像中定位任意實例的能力。該定位頭部被訓練來將類別名稱的文本嵌入與擴散模型的區域視覺特徵對齊，並使用來自現成對象檢測器的監督，以及一種新穎的自我訓練方案來處理檢測器未涵蓋的（新穎）類別。這種增強版的擴散模型被稱為InstaGen，可以作為對象檢測的數據合成器。我們進行了全面的實驗，表明在從InstaGen的合成數據集上進行訓練時，對象檢測器可以得到增強，並在開放詞彙（+4.5 AP）和數據稀疏（+1.2至5.2 AP）情況下展現出優越的性能，優於現有的最先進方法。

English

In this paper, we introduce a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising arbitrary instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. This enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer for object detection. We conduct thorough experiments to show that, object detector can be enhanced while training on the synthetic dataset from InstaGen, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios.

InstaGen：通過在合成數據集上訓練來增強物體檢測

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

摘要

Support