ChatPaper.aiChatPaper

將大型語言模型整合到三模態架構中,用於自動化抑鬱分類。

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification

July 27, 2024
作者: Santosh V. Patapati
cs.AI

摘要

重大抑鬱症(MDD)是一種普遍的心理健康狀態,影響全球三億人口。本研究提出了一種新穎的基於BiLSTM的三模態模型級融合架構,用於從臨床訪談錄音中對抑鬱進行二元分類。所提出的架構融合了梅爾頻率倒頻譜係數、面部表情單元,並使用基於雙樣本學習的GPT-4模型來處理文本數據。這是首個將大型語言模型融入多模態架構以執行此任務的研究。它在DAIC-WOZ AVEC 2016挑戰賽交叉驗證分割和留一主題交叉驗證分割上取得了令人印象深刻的結果,超越了所有基準模型和多個最先進模型。在留一主題交叉驗證測試中,它實現了91.01%的準確率,85.95%的F1分數,80%的精確度和92.86%的召回率。
English
Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

Summary

AI-Generated Summary

PDF599November 28, 2024