卫星图像分类中的平衡多任务注意力机制:一种无需预训练即在EuroSAT数据集上实现97.23%准确率的系统方法
Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training
October 17, 2025
作者: Aditya Vir
cs.AI
摘要
本研究系統性地探討了針對衛星土地利用分類任務定制的卷積神經網絡架構,在不依賴預訓練模型的情況下,於EuroSAT數據集上達到了97.23%的測試準確率。通過三個逐步迭代的架構設計(基準模型:94.30%,CBAM增強型:95.98%,以及平衡多任務注意力機制:97.23%),我們識別並解決了衛星影像分類中的特定失效模式。本研究的核心貢獻在於提出了一種新穎的平衡多任務注意力機制,該機制結合了用於空間特徵提取的坐標注意力模塊與用於光譜特徵提取的壓縮激勵模塊,並通過一個可學習的融合參數實現了二者的統一。實驗結果表明,該可學習參數自主收斂至α≈0.57,這表明空間與光譜模態在衛星影像分類中具有近乎同等的重要性。我們採用了漸進式DropBlock正則化(按網絡深度從5%至20%)與類別平衡損失加權策略,以應對過擬合與混淆模式不平衡問題。最終的12層網絡架構實現了Cohen's Kappa值0.9692,所有類別的準確率均超過94.46%,且正確與錯誤預測之間存在24.25%的置信度差距,展現了良好的校準性能。我們的方法在無需外部數據的情況下,達到了與微調ResNet-50(98.57%)僅相差1.34%的性能,驗證了系統性架構設計在特定領域應用中的有效性。完整的代碼、訓練模型及評估腳本均已公開。
English
This work presents a systematic investigation of custom convolutional neural
network architectures for satellite land use classification, achieving 97.23%
test accuracy on the EuroSAT dataset without reliance on pre-trained models.
Through three progressive architectural iterations (baseline: 94.30%,
CBAM-enhanced: 95.98%, and balanced multi-task attention: 97.23%) we identify
and address specific failure modes in satellite imagery classification. Our
principal contribution is a novel balanced multi-task attention mechanism that
combines Coordinate Attention for spatial feature extraction with
Squeeze-Excitation blocks for spectral feature extraction, unified through a
learnable fusion parameter. Experimental results demonstrate that this
learnable parameter autonomously converges to alpha approximately 0.57,
indicating near-equal importance of spatial and spectral modalities for
satellite imagery. We employ progressive DropBlock regularization (5-20% by
network depth) and class-balanced loss weighting to address overfitting and
confusion pattern imbalance. The final 12-layer architecture achieves Cohen's
Kappa of 0.9692 with all classes exceeding 94.46% accuracy, demonstrating
confidence calibration with a 24.25% gap between correct and incorrect
predictions. Our approach achieves performance within 1.34% of fine-tuned
ResNet-50 (98.57%) while requiring no external data, validating the efficacy of
systematic architectural design for domain-specific applications. Complete
code, trained models, and evaluation scripts are publicly available.