密度自适应注意力语音网络:增强对心理健康障碍的特征理解
Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders
August 31, 2024
作者: Georgios Ioannides, Adrian Kieback, Aman Chadha, Aaron Elkins
cs.AI
摘要
基于语音的抑郁检测由于在个体间表现独特且数据稀缺,对自动化检测提出了重大挑战。为了解决这些挑战,我们引入了DAAMAudioCNNLSTM和DAAMAudioTransformer,这两种参数高效且可解释的模型用于音频特征提取和抑郁检测。DAAMAudioCNNLSTM采用了一种新颖的CNN-LSTM框架,配备多头密度自适应注意力机制(DAAM),动态关注信息丰富的语音片段。DAAMAudioTransformer则利用变压器编码器取代CNN-LSTM架构,并整合了相同的DAAM模块以增强注意力和可解释性。这些方法不仅提高了检测的稳健性和可解释性,还取得了最先进的性能:DAAMAudioCNNLSTM在DAIC-WOZ数据集上的F1宏分数为0.702,DAAMAudioTransformer为0.72,而且在训练/验证期间不依赖于先前方法中的元音位置和说话者信息等补充信息。这两种模型在利用语音信号进行抑郁检测方面的显著可解释性和效率代表了朝着更可靠、临床实用的诊断工具迈出的一大步,为语音和心理健康护理的进步带来了希望。为了促进该领域的进一步研究,我们将我们的代码公开提供。
English
Speech-based depression detection poses significant challenges for automated
detection due to its unique manifestation across individuals and data scarcity.
Addressing these challenges, we introduce DAAMAudioCNNLSTM and
DAAMAudioTransformer, two parameter efficient and explainable models for audio
feature extraction and depression detection. DAAMAudioCNNLSTM features a novel
CNN-LSTM framework with multi-head Density Adaptive Attention Mechanism (DAAM),
focusing dynamically on informative speech segments. DAAMAudioTransformer,
leveraging a transformer encoder in place of the CNN-LSTM architecture,
incorporates the same DAAM module for enhanced attention and interpretability.
These approaches not only enhance detection robustness and interpretability but
also achieve state-of-the-art performance: DAAMAudioCNNLSTM with an F1 macro
score of 0.702 and DAAMAudioTransformer with an F1 macro score of 0.72 on the
DAIC-WOZ dataset, without reliance on supplementary information such as vowel
positions and speaker information during training/validation as in previous
approaches. Both models' significant explainability and efficiency in
leveraging speech signals for depression detection represent a leap towards
more reliable, clinically useful diagnostic tools, promising advancements in
speech and mental health care. To foster further research in this domain, we
make our code publicly available.Summary
AI-Generated Summary