Gen2Det: 検出のための生成

要旨

最近、拡散モデルは合成画像の品質向上と生成制御の改善を示しています。本論文では、最先端のグラウンディング画像生成手法を活用して、無料で物体検出のための合成トレーニングデータを作成するシンプルでモジュール型のパイプラインであるGen2Detを提案します。既存の研究では個々の物体インスタンスを生成し、前景を特定した後に他の画像に貼り付ける必要がありますが、我々はシーン中心の画像を直接生成する方法を簡素化しました。合成データに加えて、Gen2Detは生成データを最大限に活用するための一連の技術も提案しています。これには、画像レベルのフィルタリング、インスタンスレベルのフィルタリング、および生成の不完全性を考慮したより良いトレーニングレシピが含まれます。Gen2Detを使用することで、様々な設定下で物体検出とセグメンテーションタスクにおいて健全な改善を示し、検出方法に依存しない結果を得ました。LVISにおけるロングテール検出設定では、Gen2Detは希少カテゴリの性能を大幅に向上させると同時に、他のカテゴリの性能も著しく改善しました。例えば、Mask R-CNNを使用したLVISの実データのみでのトレーニングと比較して、Box APが2.13、Mask APが1.84向上しました。COCOの低データ設定では、Gen2DetはBox APとMask APをそれぞれ2.27ポイントと1.85ポイント一貫して向上させました。最も一般的な検出設定においても、Gen2Detは堅牢な性能向上を示し、例えばCOCOのBox APとMask APをそれぞれ0.45ポイントと0.32ポイント改善しました。

English

Recently diffusion models have shown improvement in synthetic image quality as well as better control in generation. We motivate and present Gen2Det, a simple modular pipeline to create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods. Unlike existing works which generate individual object instances, require identifying foreground followed by pasting on other images, we simplify to directly generating scene-centric images. In addition to the synthetic data, Gen2Det also proposes a suite of techniques to best utilize the generated data, including image-level filtering, instance-level filtering, and better training recipe to account for imperfections in the generation. Using Gen2Det, we show healthy improvements on object detection and segmentation tasks under various settings and agnostic to detection methods. In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e.g. we see an improvement of 2.13 Box AP and 1.84 Mask AP over just training on real data on LVIS with Mask R-CNN. In the low-data regime setting on COCO, Gen2Det consistently improves both Box and Mask AP by 2.27 and 1.85 points. In the most general detection setting, Gen2Det still demonstrates robust performance gains, e.g. it improves the Box and Mask AP on COCO by 0.45 and 0.32 points.

Gen2Det: 検出のための生成

Gen2Det: Generate to Detect

要旨

Support