ChatPaper.aiChatPaper

適應車輛檢測器於未見領域的航空影像:基於弱監督的遷移學習

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

July 28, 2025
作者: Xiao Fang, Minhyek Jeon, Zheyang Qin, Stanislav Panev, Celso de Melo, Shuowen Hu, Shayok Chakraborty, Fernando De la Torre
cs.AI

摘要

在航空影像中檢測車輛是一項關鍵任務,其應用涵蓋交通監控、城市規劃和國防情報等領域。深度學習方法在這一應用中已提供了最先進(SOTA)的成果。然而,當模型在某一地理區域的數據上訓練後,無法有效泛化到其他區域時,便面臨著重大挑戰。環境條件、城市佈局、道路網絡、車輛類型以及影像獲取參數(如分辨率、光照和角度)等因素的變化,導致了域偏移,從而降低了模型性能。本文提出了一種新方法,利用生成式AI合成高質量的航空影像及其標籤,通過數據增強來改進檢測器的訓練。我們的核心貢獻是開發了一個多階段、多模態的知識轉移框架,利用微調的潛在擴散模型(LDMs)來縮小源環境與目標環境之間的分布差距。在跨多樣化航空影像領域的廣泛實驗中,相較於源域數據上的監督學習、弱監督適應方法、無監督域適應方法以及開放集目標檢測器,我們的模型在AP50指標上分別實現了4-23%、6-10%、7-40%以及超過50%的持續性能提升。此外,我們還引入了兩個新標註的航空數據集,分別來自新西蘭和猶他州,以支持該領域的進一步研究。項目頁面可訪問:https://humansensinglab.github.io/AGenDA。
English
Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such as environmental conditions, urban layouts, road networks, vehicle types, and image acquisition parameters (e.g., resolution, lighting, and angle) leads to domain shifts that degrade model performance. This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels, improving detector training through data augmentation. Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework utilizing fine-tuned latent diffusion models (LDMs) to mitigate the distribution gap between the source and target environments. Extensive experiments across diverse aerial imagery domains show consistent performance improvements in AP50 over supervised learning on source domain data, weakly supervised adaptation methods, unsupervised domain adaptation methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than 50%, respectively. Furthermore, we introduce two newly annotated aerial datasets from New Zealand and Utah to support further research in this field. Project page is available at: https://humansensinglab.github.io/AGenDA
PDF93July 31, 2025