利用弱监督将车辆检测器适应于未见过的航拍图像领域
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
July 28, 2025
作者: Xiao Fang, Minhyek Jeon, Zheyang Qin, Stanislav Panev, Celso de Melo, Shuowen Hu, Shayok Chakraborty, Fernando De la Torre
cs.AI
摘要
在航拍图像中检测车辆是一项关键任务,广泛应用于交通监控、城市规划及国防情报领域。深度学习方法已在此应用中取得了最先进的成果。然而,当模型在某一地理区域的数据上训练后,难以有效泛化至其他区域时,便面临重大挑战。环境条件、城市布局、道路网络、车辆类型以及图像获取参数(如分辨率、光照和角度)等因素的差异,导致了域偏移,从而降低了模型性能。本文提出了一种创新方法,利用生成式AI合成高质量的航拍图像及其标签,通过数据增强提升检测器的训练效果。我们的核心贡献在于开发了一个多阶段、多模态的知识迁移框架,该框架采用微调的潜在扩散模型(LDMs)来缩小源环境与目标环境之间的分布差距。在多种航拍图像领域的广泛实验中,相较于源域数据上的监督学习、弱监督适应方法、无监督域适应方法以及开放集目标检测器,我们的方法在AP50指标上分别实现了4-23%、6-10%、7-40%及超过50%的持续性能提升。此外,我们还引入了来自新西兰和犹他州的两个新标注的航拍数据集,以支持该领域的进一步研究。项目页面请访问:https://humansensinglab.github.io/AGenDA。
English
Detecting vehicles in aerial imagery is a critical task with applications in
traffic monitoring, urban planning, and defense intelligence. Deep learning
methods have provided state-of-the-art (SOTA) results for this application.
However, a significant challenge arises when models trained on data from one
geographic region fail to generalize effectively to other areas. Variability in
factors such as environmental conditions, urban layouts, road networks, vehicle
types, and image acquisition parameters (e.g., resolution, lighting, and angle)
leads to domain shifts that degrade model performance. This paper proposes a
novel method that uses generative AI to synthesize high-quality aerial images
and their labels, improving detector training through data augmentation. Our
key contribution is the development of a multi-stage, multi-modal knowledge
transfer framework utilizing fine-tuned latent diffusion models (LDMs) to
mitigate the distribution gap between the source and target environments.
Extensive experiments across diverse aerial imagery domains show consistent
performance improvements in AP50 over supervised learning on source domain
data, weakly supervised adaptation methods, unsupervised domain adaptation
methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than
50%, respectively. Furthermore, we introduce two newly annotated aerial
datasets from New Zealand and Utah to support further research in this field.
Project page is available at: https://humansensinglab.github.io/AGenDA