대형 언어 모델을 삼중 모달 아키텍처에 통합하여 자동화된 우울증 분류 수행

초록

주요 우울 장애(MDD)는 전 세계적으로 3억 명에 달하는 사람들에게 영향을 미치는 광범위한 정신 건강 상태입니다. 본 연구는 임상 인터뷰 녹음 자료를 기반으로 우울증을 이진 분류하기 위해 BiLSTM 기반의 새로운 트라이모달 모델 수준 융합 아키텍처를 제안합니다. 제안된 아키텍처는 멜 주파수 켑스트럼 계수(Mel Frequency Cepstral Coefficients)와 얼굴 동작 단위(Facial Action Units)를 통합하며, 텍스트 데이터 처리를 위해 두 샷 학습(two-shot learning) 기반의 GPT-4 모델을 사용합니다. 이는 대규모 언어 모델을 다중 모달 아키텍처에 통합한 최초의 연구입니다. 이 모델은 DAIC-WOZ AVEC 2016 챌린지의 교차 검증 분할 및 Leave-One-Subject-Out 교차 검증 분할에서 모든 기준 모델과 여러 최첨단 모델을 능가하는 인상적인 결과를 달성했습니다. Leave-One-Subject-Out 테스트에서 정확도 91.01%, F1 점수 85.95%, 정밀도 80%, 재현율 92.86%를 기록했습니다.

English

Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

대형 언어 모델을 삼중 모달 아키텍처에 통합하여 자동화된 우울증 분류 수행

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification

초록

Support