ChatPaper.aiChatPaper

基于几何感知优化的呼吸音分类:通过SAM优化的音频频谱变换器提升灵敏度

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

December 27, 2025
作者: Atakan Işık, Selin Vulga Işık, Ahmet Feridun Işık, Mahşuk Taylan
cs.AI

摘要

基于呼吸音分类的研究因ICBHI 2017等基准数据集存在规模有限、噪声干扰显著及类别严重失衡等问题而进展缓慢。虽然基于Transformer的模型具备强大的特征提取能力,但在处理此类受限医疗数据时容易过拟合,且常收敛至损失空间的尖锐极小值。为此,我们提出通过锐度感知最小化(SAM)增强音频频谱Transformer(AST)的框架。该方法不仅最小化训练损失,更通过优化损失曲面的几何形态,引导模型朝向泛化能力更优的平坦极小值收敛。同时采用加权采样策略以有效应对类别失衡问题。在ICBHI 2017数据集上,本方法以68.10%的综合评分达到当前最优水平,显著超越现有CNN及混合基线模型。更重要的是,其68.31%的敏感度指标为临床可靠筛查提供了关键性提升。通过t-SNE降维可视化与注意力图谱的进一步分析证实,该模型能够学习具有判别力的鲁棒特征,而非简单记忆背景噪声。
English
Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are prone to overfitting and often converge to sharp minima in the loss landscape when trained on such constrained medical data. To address this, we introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM). Instead of merely minimizing the training loss, our approach optimizes the geometry of the loss surface, guiding the model toward flatter minima that generalize better to unseen patients. We also implement a weighted sampling strategy to handle class imbalance effectively. Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid baselines. More importantly, it reaches a sensitivity of 68.31%, a crucial improvement for reliable clinical screening. Further analysis using t-SNE and attention maps confirms that the model learns robust, discriminative features rather than memorizing background noise.
PDF51January 2, 2026