Mask-to-Height: Een op YOLOv11 gebaseerde architectuur voor gezamenlijke gebouwinstantiesegmentatie en hoogteclassificatie vanuit satellietbeelden

Samenvatting

Nauwkeurige gebouweninstancesegmentatie en hoogteclassificatie zijn cruciaal voor stedelijke planning, 3D-stadsmodellering en infrastructuurmonitoring. Dit artikel presenteert een gedetailleerde analyse van YOLOv11, de recente vooruitgang in de YOLO-reeks van deep learning-modellen, met de focus op de toepassing voor gecombineerde gebouwenextractie en discrete hoogteclassificatie vanuit satellietbeelden. YOLOv11 bouwt voort op de sterke punten van eerdere YOLO-modellen door de introductie van een efficiëntere architectuur die kenmerken op verschillende schalen beter combineert, de objectlocalisatienauwkeurigheid verbetert en de prestaties in complexe stedelijke omgevingen verhoogt. Met behulp van de DFC2023 Track 2-dataset – die meer dan 125.000 geannoteerde gebouwen verspreid over 12 steden omvat – evalueren we de prestaties van YOLOv11 met metrieken zoals precisie, recall, F1-score en gemiddelde precisie (mAP). Onze bevindingen tonen aan dat YOLOv11 sterke instancesegmentatieprestaties bereikt met 60,4% mAP@50 en 38,3% mAP@50–95, waarbij het tegelijkertijd robuuste classificatienauwkeurigheid handhaaft over vijf vooraf gedefinieerde hoogtecategorieën. Het model blinkt uit in het omgaan met occlusies, complexe gebouwvormen en klasse-onbalans, particularly voor zeldzame hoogbouwstructuren. Vergelijkende analyse bevestigt dat YOLOv11 eerdere multitask-frameworks overtreft in zowel detectienauwkeurigheid als inferentiesnelheid, waardoor het bijzonder geschikt is voor real-time, grootschalige stedelijke kartografie. Dit onderzoek belicht het potentieel van YOLOv11 om semantische stadsreconstructie te bevorderen door middel van gestroomlijnde categorische hoogtemodellering, en biedt praktische inzichten voor toekomstige ontwikkelingen in remote sensing en geospatiale intelligentie.

English

Accurate building instance segmentation and height classification are critical for urban planning, 3D city modeling, and infrastructure monitoring. This paper presents a detailed analysis of YOLOv11, the recent advancement in the YOLO series of deep learning models, focusing on its application to joint building extraction and discrete height classification from satellite imagery. YOLOv11 builds on the strengths of earlier YOLO models by introducing a more efficient architecture that better combines features at different scales, improves object localization accuracy, and enhances performance in complex urban scenes. Using the DFC2023 Track 2 dataset -- which includes over 125,000 annotated buildings across 12 cities -- we evaluate YOLOv11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLOv11 achieves strong instance segmentation performance with 60.4\% mAP@50 and 38.3\% mAP@50--95 while maintaining robust classification accuracy across five predefined height tiers. The model excels in handling occlusions, complex building shapes, and class imbalance, particularly for rare high-rise structures. Comparative analysis confirms that YOLOv11 outperforms earlier multitask frameworks in both detection accuracy and inference speed, making it well-suited for real-time, large-scale urban mapping. This research highlights YOLOv11's potential to advance semantic urban reconstruction through streamlined categorical height modeling, offering actionable insights for future developments in remote sensing and geospatial intelligence.

Mask-to-Height: Een op YOLOv11 gebaseerde architectuur voor gezamenlijke gebouwinstantiesegmentatie en hoogteclassificatie vanuit satellietbeelden

Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery

Samenvatting

Support