密度自適應注意力語音網絡:增強對心理健康疾病的特徵理解
Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders
August 31, 2024
作者: Georgios Ioannides, Adrian Kieback, Aman Chadha, Aaron Elkins
cs.AI
摘要
基於語音的抑鬱症檢測對於自動檢測來說存在著重大挑戰,這是由於其在不同個體間呈現獨特表現且數據稀缺。為應對這些挑戰,我們引入了DAAMAudioCNNLSTM和DAAMAudioTransformer兩種參數高效且可解釋的模型,用於音頻特徵提取和抑鬱症檢測。DAAMAudioCNNLSTM採用了一種新穎的CNN-LSTM框架,並搭配多頭密度自適應注意機制(DAAM),動態聚焦於資訊豐富的語音片段。DAAMAudioTransformer則利用變壓器編碼器取代CNN-LSTM架構,同時整合了相同的DAAM模組,以增強注意力和可解釋性。這些方法不僅提高了檢測的穩健性和可解釋性,還實現了最先進的性能:在DAIC-WOZ數據集上,DAAMAudioCNNLSTM的F1宏平均分數為0.702,而DAAMAudioTransformer的F1宏平均分數為0.72,且在訓練/驗證過程中無需依賴如前期方法中的元音位置和說話者信息等補充信息。這兩種模型在利用語音信號進行抑鬱症檢測方面的重要可解釋性和效率,代表了邁向更可靠、臨床實用的診斷工具的一大飛躍,為語音和心理健康護理的發展帶來了希望。為了促進該領域的進一步研究,我們將我們的代碼公開發布。
English
Speech-based depression detection poses significant challenges for automated
detection due to its unique manifestation across individuals and data scarcity.
Addressing these challenges, we introduce DAAMAudioCNNLSTM and
DAAMAudioTransformer, two parameter efficient and explainable models for audio
feature extraction and depression detection. DAAMAudioCNNLSTM features a novel
CNN-LSTM framework with multi-head Density Adaptive Attention Mechanism (DAAM),
focusing dynamically on informative speech segments. DAAMAudioTransformer,
leveraging a transformer encoder in place of the CNN-LSTM architecture,
incorporates the same DAAM module for enhanced attention and interpretability.
These approaches not only enhance detection robustness and interpretability but
also achieve state-of-the-art performance: DAAMAudioCNNLSTM with an F1 macro
score of 0.702 and DAAMAudioTransformer with an F1 macro score of 0.72 on the
DAIC-WOZ dataset, without reliance on supplementary information such as vowel
positions and speaker information during training/validation as in previous
approaches. Both models' significant explainability and efficiency in
leveraging speech signals for depression detection represent a leap towards
more reliable, clinically useful diagnostic tools, promising advancements in
speech and mental health care. To foster further research in this domain, we
make our code publicly available.Summary
AI-Generated Summary