掩模转高度:基于YOLOv11的卫星影像建筑实例分割与高度分类联合学习架构
Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery
October 31, 2025
作者: Mahmoud El Hussieni, Bahadır K. Güntürk, Hasan F. Ateş, Oğuz Hanoğlu
cs.AI
摘要
精确的建筑实例分割与高度分类对于城市规划、三维城市建模及基础设施监测至关重要。本文针对YOLO系列深度学习模型的最新进展YOLOv11展开详细分析,重点探讨其在卫星图像中联合进行建筑提取与离散高度分类的应用。YOLOv11通过引入能更有效融合多尺度特征、提升目标定位精度并增强复杂城市场景性能的高效架构,延续了早期YOLO模型的优势。基于DFC2023 Track 2数据集(涵盖12个城市超过12.5万栋标注建筑),我们采用精确率、召回率、F1分数和平均精度均值(mAP)等指标评估YOLOv11性能。实验表明,该模型在保持五级预设高度分类鲁棒性的同时,实现了mAP@50达60.4%、mAP@50-95达38.3%的优秀实例分割性能,尤其在处理遮挡、复杂建筑形态和类别不平衡(如罕见高层建筑)方面表现突出。对比分析证实YOLOv11在检测精度与推理速度上均优于早期多任务框架,适用于实时大规模城市测绘。本研究通过简化的分层高度建模,揭示了YOLOv11推动语义化城市重建的潜力,为遥感与地理空间智能领域的未来发展提供了可操作的见解。
English
Accurate building instance segmentation and height classification are
critical for urban planning, 3D city modeling, and infrastructure monitoring.
This paper presents a detailed analysis of YOLOv11, the recent advancement in
the YOLO series of deep learning models, focusing on its application to joint
building extraction and discrete height classification from satellite imagery.
YOLOv11 builds on the strengths of earlier YOLO models by introducing a more
efficient architecture that better combines features at different scales,
improves object localization accuracy, and enhances performance in complex
urban scenes. Using the DFC2023 Track 2 dataset -- which includes over 125,000
annotated buildings across 12 cities -- we evaluate YOLOv11's performance using
metrics such as precision, recall, F1 score, and mean average precision (mAP).
Our findings demonstrate that YOLOv11 achieves strong instance segmentation
performance with 60.4\% mAP@50 and 38.3\% mAP@50--95 while maintaining robust
classification accuracy across five predefined height tiers. The model excels
in handling occlusions, complex building shapes, and class imbalance,
particularly for rare high-rise structures. Comparative analysis confirms that
YOLOv11 outperforms earlier multitask frameworks in both detection accuracy and
inference speed, making it well-suited for real-time, large-scale urban
mapping. This research highlights YOLOv11's potential to advance semantic urban
reconstruction through streamlined categorical height modeling, offering
actionable insights for future developments in remote sensing and geospatial
intelligence.