基於實例分割的像素級路面損傷評估

摘要

自動化路面病害評估不只需要影像層級的分類或粗略的邊界框檢測，更要求精確定位細長、分叉且不規則的裂縫，以達到養護量化所需的幾何精度。本文提出一套基於Mask R-CNN實例分割的視覺式路面病害分析系統，並在自訂的野外道路影像資料集UWGB-StreetCrack上進行評估。該資料集由車載智慧型手機收集，並以多邊形標籤人工標註縱向裂縫、橫向裂縫、龜裂裂縫及坑洞。研究中採用五種基於Detectron2的Mask R-CNN主幹網路變體，並在一致微調流程下進行比較。表現最佳的模型為以ResNet-101 FPN為主幹的Mask R-CNN，在專案特定的邊界框匹配協議下，達到84.23%的精確率、90.04%的召回率，以及87.04%的F1分數。該模型估算出的裂縫面積比例總和為2.164%，與真實裂縫面積比例2.170%高度吻合。為了將分割系統與以檢測器為導向的替代方案進行對照，本研究亦將基於CSPDarknet53的YOLO檢測器進行調整並重新訓練於相同資料集，其在驗證協議下僅達到27.5%的精確率與20.7%的召回率。結果顯示，實例分割是處理實地路面影像與估算裂縫總面積的務實方向，同時也揭示了標註一致性、類別不平衡、混淆因子排除及遮罩層級基準測試等未解挑戰。

English

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.