약한 감독을 통해 미지의 도메인에 항공 이미지용 차량 탐지기 적응하기

초록

항공 이미지에서 차량을 탐지하는 것은 교통 모니터링, 도시 계획, 방위 정보 등 다양한 분야에서 중요한 과제입니다. 딥러닝 기법은 이러한 응용 분야에서 최첨단(SOTA) 성능을 제공해 왔습니다. 그러나 한 지역의 데이터로 훈련된 모델이 다른 지역에 효과적으로 일반화하지 못하는 문제가 발생합니다. 환경 조건, 도시 구조, 도로 네트워크, 차량 유형, 이미지 획득 파라미터(예: 해상도, 조명, 각도) 등의 변동성은 도메인 변화를 초래하여 모델 성능을 저하시킵니다. 본 논문은 생성형 AI를 활용하여 고품질 항공 이미지와 해당 레이블을 합성함으로써 데이터 증강을 통해 탐지기 훈련을 개선하는 새로운 방법을 제안합니다. 주요 기여는 미세 조정된 잠재 확산 모델(LDMs)을 활용한 다단계, 다중 모달 지식 전이 프레임워크를 개발하여 소스와 타겟 환경 간의 분포 격차를 완화하는 것입니다. 다양한 항공 이미지 도메인에서 수행한 광범위한 실험을 통해 소스 도메인 데이터에 대한 지도 학습, 약한 지도 적응 방법, 비지도 도메인 적응 방법, 개방형 객체 탐지기 대비 AP50 성능이 각각 4-23%, 6-10%, 7-40%, 50% 이상 향상되었음을 보여줍니다. 또한, 이 분야의 추가 연구를 지원하기 위해 뉴질랜드와 유타 지역의 새로운 주석이 추가된 항공 데이터셋 두 가지를 소개합니다. 프로젝트 페이지는 https://humansensinglab.github.io/AGenDA에서 확인할 수 있습니다.

English

Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such as environmental conditions, urban layouts, road networks, vehicle types, and image acquisition parameters (e.g., resolution, lighting, and angle) leads to domain shifts that degrade model performance. This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels, improving detector training through data augmentation. Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework utilizing fine-tuned latent diffusion models (LDMs) to mitigate the distribution gap between the source and target environments. Extensive experiments across diverse aerial imagery domains show consistent performance improvements in AP50 over supervised learning on source domain data, weakly supervised adaptation methods, unsupervised domain adaptation methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than 50%, respectively. Furthermore, we introduce two newly annotated aerial datasets from New Zealand and Utah to support further research in this field. Project page is available at: https://humansensinglab.github.io/AGenDA

약한 감독을 통해 미지의 도메인에 항공 이미지용 차량 탐지기 적응하기

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

초록

Support