ChatPaper.aiChatPaper

面向呼吸音分类的几何感知优化:基于SAM优化的音频声谱Transformer提升灵敏度

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

December 27, 2025
作者: Atakan Işık, Selin Vulga Işık, Ahmet Feridun Işık, Mahşuk Taylan
cs.AI

摘要

呼吸音分类研究受限于ICBHI 2017等基准数据集规模有限、噪声水平高及类别严重不平衡的问题。虽然基于Transformer的模型具备强大的特征提取能力,但在处理此类受限医疗数据时容易过拟合,且常收敛至损失空间的尖锐极小值。为此,我们提出一种结合锐度感知最小化(SAM)的音频频谱Transformer(AST)增强框架。该方法不仅最小化训练损失,更通过优化损失曲面几何形态,引导模型朝向泛化能力更强的平坦极小值收敛。同时采用加权采样策略以有效应对类别不平衡问题。在ICBHI 2017数据集上,我们的方法以68.10%的综合评分达到当前最优水平,超越了现有CNN及混合基线模型。更重要的是,其敏感度达到68.31%,这对可靠临床筛查至关重要。通过t-SNE降维和注意力图谱的进一步分析证实,该模型学习到的是鲁棒且判别性强的特征,而非对背景噪声的机械记忆。
English
Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are prone to overfitting and often converge to sharp minima in the loss landscape when trained on such constrained medical data. To address this, we introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM). Instead of merely minimizing the training loss, our approach optimizes the geometry of the loss surface, guiding the model toward flatter minima that generalize better to unseen patients. We also implement a weighted sampling strategy to handle class imbalance effectively. Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid baselines. More importantly, it reaches a sensitivity of 68.31%, a crucial improvement for reliable clinical screening. Further analysis using t-SNE and attention maps confirms that the model learns robust, discriminative features rather than memorizing background noise.
PDF51January 2, 2026