卫星图像分类中的平衡多任务注意力机制:一种无需预训练即在EuroSAT数据集上实现97.23%准确率的系统方法
Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training
October 17, 2025
作者: Aditya Vir
cs.AI
摘要
本研究系统探讨了针对卫星土地利用分类的定制卷积神经网络架构,在不依赖预训练模型的情况下,在EuroSAT数据集上实现了97.23%的测试准确率。通过三个逐步迭代的架构设计(基线模型:94.30%,CBAM增强模型:95.98%,以及平衡多任务注意力模型:97.23%),我们识别并解决了卫星图像分类中的特定失效模式。我们的主要贡献是提出了一种新颖的平衡多任务注意力机制,该机制将用于空间特征提取的坐标注意力与用于光谱特征提取的压缩激励模块相结合,并通过可学习的融合参数进行统一。实验结果表明,该可学习参数自主收敛至约0.57的α值,表明空间和光谱模态在卫星图像中具有近乎同等的重要性。我们采用渐进式DropBlock正则化(按网络深度从5%到20%)和类别平衡损失加权来解决过拟合和混淆模式不平衡问题。最终的12层架构实现了Cohen's Kappa系数0.9692,所有类别的准确率均超过94.46%,展示了置信度校准,正确与错误预测之间的差距达到24.25%。我们的方法在无需外部数据的情况下,达到了与微调ResNet-50(98.57%)仅相差1.34%的性能,验证了系统化架构设计在特定领域应用中的有效性。完整的代码、训练模型和评估脚本均已公开。
English
This work presents a systematic investigation of custom convolutional neural
network architectures for satellite land use classification, achieving 97.23%
test accuracy on the EuroSAT dataset without reliance on pre-trained models.
Through three progressive architectural iterations (baseline: 94.30%,
CBAM-enhanced: 95.98%, and balanced multi-task attention: 97.23%) we identify
and address specific failure modes in satellite imagery classification. Our
principal contribution is a novel balanced multi-task attention mechanism that
combines Coordinate Attention for spatial feature extraction with
Squeeze-Excitation blocks for spectral feature extraction, unified through a
learnable fusion parameter. Experimental results demonstrate that this
learnable parameter autonomously converges to alpha approximately 0.57,
indicating near-equal importance of spatial and spectral modalities for
satellite imagery. We employ progressive DropBlock regularization (5-20% by
network depth) and class-balanced loss weighting to address overfitting and
confusion pattern imbalance. The final 12-layer architecture achieves Cohen's
Kappa of 0.9692 with all classes exceeding 94.46% accuracy, demonstrating
confidence calibration with a 24.25% gap between correct and incorrect
predictions. Our approach achieves performance within 1.34% of fine-tuned
ResNet-50 (98.57%) while requiring no external data, validating the efficacy of
systematic architectural design for domain-specific applications. Complete
code, trained models, and evaluation scripts are publicly available.