掩模到高度:基于YOLOv11的衛星影像建築實例分割與高度分類聯合架構
Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery
October 31, 2025
作者: Mahmoud El Hussieni, Bahadır K. Güntürk, Hasan F. Ateş, Oğuz Hanoğlu
cs.AI
摘要
精確的建築物實例分割與高度分類對於都市規劃、三維城市建模和基礎設施監測至關重要。本文針對YOLO系列深度學習模型的最新進展YOLOv11,聚焦其在衛星影像上實現建築物聯合提取與離散高度分類的應用進行詳細分析。YOLOv11通過引入能更有效融合多尺度特徵、提升目標定位精度並增強複雜城市場景表現的高效架構,延續了早期YOLO模型的優勢。基於DFC2023 Track 2數據集(涵蓋12個城市超過12.5萬個標註建築物),我們採用精確率、召回率、F1分數和平均精度均值(mAP)等指標評估YOLOv11的性能。實驗結果表明,YOLOv11在保持五個預定義高度層級分類魯棒性的同時,實現了60.4% mAP@50和38.3% mAP@50–95的優異實例分割性能。該模型在處理遮擋、複雜建築形態及類別不平衡(特別是罕見高層建築)方面表現突出。對比分析證實,YOLOv11在檢測精度與推理速度上均超越早期多任務框架,使其特別適合大規模城市實時製圖。本研究揭示了YOLOv11通過流線型分類高度建模推動語義化城市重建的潛力,為遙感與地理空間智能領域的未來發展提供了可操作的見解。
English
Accurate building instance segmentation and height classification are
critical for urban planning, 3D city modeling, and infrastructure monitoring.
This paper presents a detailed analysis of YOLOv11, the recent advancement in
the YOLO series of deep learning models, focusing on its application to joint
building extraction and discrete height classification from satellite imagery.
YOLOv11 builds on the strengths of earlier YOLO models by introducing a more
efficient architecture that better combines features at different scales,
improves object localization accuracy, and enhances performance in complex
urban scenes. Using the DFC2023 Track 2 dataset -- which includes over 125,000
annotated buildings across 12 cities -- we evaluate YOLOv11's performance using
metrics such as precision, recall, F1 score, and mean average precision (mAP).
Our findings demonstrate that YOLOv11 achieves strong instance segmentation
performance with 60.4\% mAP@50 and 38.3\% mAP@50--95 while maintaining robust
classification accuracy across five predefined height tiers. The model excels
in handling occlusions, complex building shapes, and class imbalance,
particularly for rare high-rise structures. Comparative analysis confirms that
YOLOv11 outperforms earlier multitask frameworks in both detection accuracy and
inference speed, making it well-suited for real-time, large-scale urban
mapping. This research highlights YOLOv11's potential to advance semantic urban
reconstruction through streamlined categorical height modeling, offering
actionable insights for future developments in remote sensing and geospatial
intelligence.