MentalArena: 自己対戦による言語モデルのトレーニングによる精神保健障害の診断と治療

要旨

精神保健障害は世界で最も深刻な疾病の1つです。そのような疾患を持つ多くの人々が適切なケアにアクセスできないことが多いため、精神保健障害の診断と治療のためのモデルのトレーニングの重要性が強調されます。しかし、精神保健領域では、プライバシー上の懸念が個別化された治療データへのアクセスを制限し、強力なモデルの構築を困難にしています。本論文では、MentalArenaというセルフプレイフレームワークを紹介し、ドメイン固有の個別化されたデータを生成することで言語モデルをトレーニングし、個別化された診断と治療（セラピストとして）を行い、情報提供（患者として）ができるより優れたモデルを獲得します。人間らしい精神保健患者を正確にモデリングするために、認知と行動の両面から実際の患者をシミュレートするSymptom Encoderを考案します。患者とセラピストの相互作用中の意図の偏りに対処するために、診断された症状とエンコードされた症状を比較し、特定された逸脱に応じて患者とセラピストの対話を動的に管理するSymptom Decoderを提案します。MentalArenaを、biomedicalQAや精神保健タスクを含む6つのベンチマークと比較して、6つの先進モデルに対して評価しました。GPT-3.5とLlama-3-8bの両方でファインチューニングされた当社のモデルは、GPT-4oを含む対照モデルを大幅に上回りました。私たちの研究が将来の個別化ケアに関する研究にインスピレーションを与えることを願っています。コードはhttps://github.com/Scarelette/MentalArena/tree/main で入手可能です。

English

Mental health disorders are one of the most serious diseases in the world. Most people with such a disease lack access to adequate care, which highlights the importance of training models for the diagnosis and treatment of mental health disorders. However, in the mental health domain, privacy concerns limit the accessibility of personalized treatment data, making it challenging to build powerful models. In this paper, we introduce MentalArena, a self-play framework to train language models by generating domain-specific personalized data, where we obtain a better model capable of making a personalized diagnosis and treatment (as a therapist) and providing information (as a patient). To accurately model human-like mental health patients, we devise Symptom Encoder, which simulates a real patient from both cognition and behavior perspectives. To address intent bias during patient-therapist interactions, we propose Symptom Decoder to compare diagnosed symptoms with encoded symptoms, and dynamically manage the dialogue between patient and therapist according to the identified deviations. We evaluated MentalArena against 6 benchmarks, including biomedicalQA and mental health tasks, compared to 6 advanced models. Our models, fine-tuned on both GPT-3.5 and Llama-3-8b, significantly outperform their counterparts, including GPT-4o. We hope that our work can inspire future research on personalized care. Code is available in https://github.com/Scarelette/MentalArena/tree/main