基於深度學習的強健型多目標多攝像頭追蹤系統 ——面向城市規模交通場景

摘要

隨著網路攝影機數量的持續增加，視覺感測器在智慧交通系統（ITS）中的交通監控、管理與優化方面變得愈發重要。然而，在城市規模的交通場景中，手動進行多個非重疊攝影機間的物體追蹤與匹配面臨著重大挑戰。這些挑戰包括處理多樣的車輛屬性、遮擋、光照變化、陰影以及不同的視訊解析度。為解決這些問題，我們提出了一種高效且經濟的基於深度學習的多目標多攝影機追蹤（MO-MCT）框架。該框架利用Mask R-CNN進行物體檢測，並採用非極大值抑制（NMS）從重疊檢測中選取目標物體。透過遷移學習實現再識別，從而關聯並生成跨多個攝影機的車輛軌跡片段。此外，我們運用適當的損失函數與距離度量來應對遮擋、光照與陰影的挑戰。最終的解決方案識別模組結合ResNet-152進行特徵提取，並基於Deep SORT實現車輛追蹤。該框架在第五屆AI City Challenge數據集（Track 3）上進行評估，該數據集包含46個攝影機的影像流。其中，40個用於模型訓練與驗證，其餘六個用於模型測試。所提出的框架在IDF1得分上達到0.8289，精確度與召回率分別為0.9026與0.8527，展現了其在穩健且準確的車輛追蹤中的有效性。

English

Vision sensors are becoming more important in Intelligent Transportation Systems (ITS) for traffic monitoring, management, and optimization as the number of network cameras continues to rise. However, manual object tracking and matching across multiple non-overlapping cameras pose significant challenges in city-scale urban traffic scenarios. These challenges include handling diverse vehicle attributes, occlusions, illumination variations, shadows, and varying video resolutions. To address these issues, we propose an efficient and cost-effective deep learning-based framework for Multi-Object Multi-Camera Tracking (MO-MCT). The proposed framework utilizes Mask R-CNN for object detection and employs Non-Maximum Suppression (NMS) to select target objects from overlapping detections. Transfer learning is employed for re-identification, enabling the association and generation of vehicle tracklets across multiple cameras. Moreover, we leverage appropriate loss functions and distance measures to handle occlusion, illumination, and shadow challenges. The final solution identification module performs feature extraction using ResNet-152 coupled with Deep SORT based vehicle tracking. The proposed framework is evaluated on the 5th AI City Challenge dataset (Track 3), comprising 46 camera feeds. Among these 46 camera streams, 40 are used for model training and validation, while the remaining six are utilized for model testing. The proposed framework achieves competitive performance with an IDF1 score of 0.8289, and precision and recall scores of 0.9026 and 0.8527 respectively, demonstrating its effectiveness in robust and accurate vehicle tracking.

基於深度學習的強健型多目標多攝像頭追蹤系統 ——面向城市規模交通場景

A Robust Deep Networks based Multi-Object MultiCamera Tracking System for City Scale Traffic

摘要

Support