Gen2Det: 탐지를 위한 생성

초록

최근 확산 모델(diffusion model)은 합성 이미지 품질의 향상과 더불어 생성 과정에서의 제어 능력도 개선되었습니다. 본 논문에서는 최신의 접지된 이미지 생성 방법을 활용하여 객체 탐지를 위한 합성 학습 데이터를 무료로 생성하는 간단하고 모듈화된 파이프라인인 Gen2Det을 제안합니다. 기존 연구들이 개별 객체 인스턴스를 생성하고 전경을 식별한 후 다른 이미지에 붙여넣는 방식을 사용한 것과 달리, 우리는 장면 중심의 이미지를 직접 생성하는 방식으로 단순화했습니다. 합성 데이터 외에도, Gen2Det은 생성된 데이터를 최적으로 활용하기 위한 일련의 기법을 제안합니다. 여기에는 이미지 수준 필터링, 인스턴스 수준 필터링, 그리고 생성 과정의 불완전성을 고려한 개선된 학습 레시피가 포함됩니다. Gen2Det을 사용하여 다양한 설정과 탐지 방법에 구애받지 않고 객체 탐지 및 세분화 작업에서 상당한 개선을 보여줍니다. LVIS 데이터셋에서의 장기 꼬리(long-tailed) 탐지 설정에서, Gen2Det은 희귀 카테고리의 성능을 크게 향상시키는 동시에 다른 카테고리의 성능도 크게 개선했습니다. 예를 들어, Mask R-CNN을 사용한 LVIS 데이터셋에서 실제 데이터만으로 학습한 경우보다 Box AP가 2.13, Mask AP가 1.84 향상되었습니다. COCO 데이터셋에서의 저데이터(low-data) 설정에서는 Box AP와 Mask AP가 각각 2.27점과 1.85점 향상되었습니다. 가장 일반적인 탐지 설정에서도 Gen2Det은 견고한 성능 향상을 보여주었으며, COCO 데이터셋에서 Box AP와 Mask AP가 각각 0.45점과 0.32점 향상되었습니다.

English

Recently diffusion models have shown improvement in synthetic image quality as well as better control in generation. We motivate and present Gen2Det, a simple modular pipeline to create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods. Unlike existing works which generate individual object instances, require identifying foreground followed by pasting on other images, we simplify to directly generating scene-centric images. In addition to the synthetic data, Gen2Det also proposes a suite of techniques to best utilize the generated data, including image-level filtering, instance-level filtering, and better training recipe to account for imperfections in the generation. Using Gen2Det, we show healthy improvements on object detection and segmentation tasks under various settings and agnostic to detection methods. In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e.g. we see an improvement of 2.13 Box AP and 1.84 Mask AP over just training on real data on LVIS with Mask R-CNN. In the low-data regime setting on COCO, Gen2Det consistently improves both Box and Mask AP by 2.27 and 1.85 points. In the most general detection setting, Gen2Det still demonstrates robust performance gains, e.g. it improves the Box and Mask AP on COCO by 0.45 and 0.32 points.

Gen2Det: 탐지를 위한 생성

Gen2Det: Generate to Detect

초록

Support