基于实例分割的像素级路面病害评估

摘要

自动路面病害评估不仅需要图像级分类或粗略的边界框检测，更要求对细小、分支状及不规则裂缝进行精确定位，以达到维修量化所需的几何精度。本文提出一种基于Mask R-CNN实例分割的视觉路面病害分析系统，并在自行采集的UWGB-StreetCrack道路图像数据集上对其进行评估。该数据集由车载智能手机获取，并针对纵向裂缝、横向裂缝、龟裂和坑槽手动标注了多边形标签。研究在统一的微调协议下，比较了基于Detectron2的五种Mask R-CNN骨干网络变体。性能最优的模型——采用ResNet-101 FPN骨干网络的Mask R-CNN——在项目特定的边界框匹配协议下，实现了84.23%的精确率、90.04%的召回率以及87.04%的F1分数。该模型预测的裂缝面积总占比为2.164%，与真实裂缝面积占比2.170%高度吻合。为将分割系统与面向检测器的替代方案进行对比，还基于CSPDarknet53的YOLO检测器进行了适配和重新训练，该检测器在验证协议下达到27.5%的精确率和20.7%的召回率。结果表明，实例分割是处理野外路面图像及估算裂缝总面积的实用方向，同时也揭示了标注一致性、类别不平衡、混杂因素剔除以及掩膜级基准测试等方面尚待解决的挑战。

English

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.