ChatPaper.aiChatPaper

将大型语言模型集成到三模态架构中,用于自动抑郁症分类。

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification

July 27, 2024
作者: Santosh V. Patapati
cs.AI

摘要

主要抑郁障碍(MDD)是一种普遍存在的精神健康状况,影响全球3亿人。本研究提出了一种新颖的基于BiLSTM的三模态模型级融合架构,用于从临床访谈录音中对抑郁进行二元分类。所提出的架构结合了梅尔频率倒谱系数、面部动作单位,并使用基于两阶段学习的GPT-4模型来处理文本数据。这是首个将大型语言模型纳入多模态架构进行此任务的研究。它在DAIC-WOZ AVEC 2016挑战赛交叉验证分割和Leave-One-Subject-Out交叉验证分割上取得了令人印象深刻的成果,超过了所有基准模型和多个最先进模型。在Leave-One-Subject-Out测试中,准确率达到91.01%,F1分数为85.95%,精确度为80%,召回率为92.86%。
English
Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

Summary

AI-Generated Summary

PDF599November 28, 2024