ChatPaper.aiChatPaper

基于鲁棒深度网络的城市级交通多目标多摄像头追踪系统

A Robust Deep Networks based Multi-Object MultiCamera Tracking System for City Scale Traffic

May 1, 2025
作者: Muhammad Imran Zaman, Usama Ijaz Bajwa, Gulshan Saleem, Rana Hammad Raza
cs.AI

摘要

随着网络摄像头数量的持续增长,视觉传感器在智能交通系统(ITS)中对于交通监控、管理和优化的重要性日益凸显。然而,在城市规模的交通场景中,跨多个非重叠摄像头进行手动目标跟踪与匹配面临重大挑战。这些挑战包括处理多样的车辆属性、遮挡、光照变化、阴影以及不同的视频分辨率。为解决这些问题,我们提出了一种高效且经济实惠的基于深度学习的多目标多摄像头跟踪(MO-MCT)框架。该框架采用Mask R-CNN进行目标检测,并利用非极大值抑制(NMS)从重叠检测中选择目标对象。通过迁移学习实现再识别,从而在多个摄像头间关联并生成车辆轨迹片段。此外,我们运用适当的损失函数和距离度量来应对遮挡、光照和阴影等挑战。最终,解决方案识别模块结合ResNet-152进行特征提取,并基于Deep SORT实现车辆跟踪。该框架在第五届AI City Challenge数据集(Track 3)上进行了评估,该数据集包含46个摄像头视频流,其中40个用于模型训练与验证,其余6个用于模型测试。所提框架取得了具有竞争力的性能,IDF1得分为0.8289,精确率和召回率分别为0.9026和0.8527,证明了其在实现鲁棒且准确的车辆跟踪方面的有效性。
English
Vision sensors are becoming more important in Intelligent Transportation Systems (ITS) for traffic monitoring, management, and optimization as the number of network cameras continues to rise. However, manual object tracking and matching across multiple non-overlapping cameras pose significant challenges in city-scale urban traffic scenarios. These challenges include handling diverse vehicle attributes, occlusions, illumination variations, shadows, and varying video resolutions. To address these issues, we propose an efficient and cost-effective deep learning-based framework for Multi-Object Multi-Camera Tracking (MO-MCT). The proposed framework utilizes Mask R-CNN for object detection and employs Non-Maximum Suppression (NMS) to select target objects from overlapping detections. Transfer learning is employed for re-identification, enabling the association and generation of vehicle tracklets across multiple cameras. Moreover, we leverage appropriate loss functions and distance measures to handle occlusion, illumination, and shadow challenges. The final solution identification module performs feature extraction using ResNet-152 coupled with Deep SORT based vehicle tracking. The proposed framework is evaluated on the 5th AI City Challenge dataset (Track 3), comprising 46 camera feeds. Among these 46 camera streams, 40 are used for model training and validation, while the remaining six are utilized for model testing. The proposed framework achieves competitive performance with an IDF1 score of 0.8289, and precision and recall scores of 0.9026 and 0.8527 respectively, demonstrating its effectiveness in robust and accurate vehicle tracking.

Summary

AI-Generated Summary

PDF21May 4, 2025