インスタンスセグメンテーションを用いたピクセルレベルの舗装損傷評価

要旨

自動舗装損傷評価には、画像レベルの分類や粗いバウンディングボックス検出以上のものが求められ、維持管理に関連する定量化に必要な幾何学的精度を達成するために、細く分岐し不規則なひび割れの正確な位置特定が必要となる。本論文では、Mask R-CNNインスタンスセグメンテーションに基づくビジョンベースの舗装損傷分析システムを提示し、車載スマートフォンで収集され、縦ひび割れ、横ひび割れ、ワニ皮ひび割れ、ポットホールに対してポリゴンラベルが手動で付与されたカスタムフィールド収集道路画像データセットUWGB-StreetCrack上で評価する。一貫したファインチューニングプロトコルの下で、5つのDetectron2ベースのMask R-CNNバックボーンバリアントを検討した。最も性能の高いモデルであるResNet-101 FPNバックボーンを備えたMask R-CNNは、プロジェクト固有のバウンディングボックスマッチングプロトコルにおいて、適合率84.23%、再現率90.04%、F1スコア87.04%を達成した。同じモデルは、累積予測ひび割れ面積率2.164%を出力し、グラウンドトゥルースのひび割れ面積率2.170%に極めて近い値となった。セグメンテーションシステムを検出器ベースの代替手法と比較するため、CSPDarknet53ベースのYOLO検出器もデータセットに適応させ再学習させたところ、検証プロトコルにおいて適合率27.5%、再現率20.7%となった。これらの結果は、インスタンスセグメンテーションがフィールド舗装画像と累積ひび割れ面積推定の実用的な方向性であることを示す一方で、アノテーションの一貫性、クラス不均衡、交絡因子の除去、マスクレベルのベンチマーキングにおける未解決の課題も明らかにしている。

English

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.