選好学習がLLMの心理カウンセリング能力を解き放つ

要旨

大規模言語モデル（LLM）を心理カウンセリング支援に応用することは、患者のニーズとメンタルヘルス支援の提供状況との間に存在する大きなギャップを埋めるための新たで意義深いアプローチです。しかし、現状のLLMはクライアントの発話に対して一貫して効果的な応答を提供することが困難であり、その主な原因は、クライアントのプライバシー保護の観点から通常アクセスできない高品質な実際の心理カウンセリングデータによる監督の欠如にあります。さらに、利用可能なセッションにおけるセラピストの応答の質は、その専門的な訓練と経験に基づいて大きく異なることがあります。セラピストの応答の質を評価することは、依然として未解決の課題です。本研究では、まずクライアントの発話に対するセラピストの応答を評価するための専門的かつ包括的な原則セットを提案することで、これらの課題に取り組みます。これらの原則を用いて、36,000の高品質な選好比較ペアを含む選好データセット「PsychoCounsel-Preference」を作成しました。このデータセットは専門心理療法士の選好に沿っており、心理カウンセリングにおけるLLMの評価と改善のための堅固な基盤を提供します。報酬モデリングと選好学習に関する実験により、PsychoCounsel-PreferenceがLLMがカウンセリングセッションにおいてクライアントに応答するための必須スキルを習得するための優れたリソースであることが実証されました。私たちの最適化モデルであるPsychoCounsel-Llama3-8Bは、GPT-4oに対して87%という印象的な勝率を達成しました。PsychoCounsel-Preference、PsychoCounsel-Llama3-8B、および報酬モデルPsychoCounsel Llama3-8B-Rewardを公開し、LLMを用いた心理カウンセリング研究の促進を図ります。詳細は以下をご覧ください：https://hf.co/Psychotherapy-LLM。

English

Applying large language models (LLMs) to assist in psycho-counseling is an emerging and meaningful approach, driven by the significant gap between patient needs and the availability of mental health support. However, current LLMs struggle to consistently provide effective responses to client speeches, largely due to the lack of supervision from high-quality real psycho-counseling data, whose content is typically inaccessible due to client privacy concerns. Furthermore, the quality of therapists' responses in available sessions can vary significantly based on their professional training and experience. Assessing the quality of therapists' responses remains an open challenge. In this work, we address these challenges by first proposing a set of professional and comprehensive principles to evaluate therapists' responses to client speeches. Using these principles, we create a preference dataset, PsychoCounsel-Preference, which contains 36k high-quality preference comparison pairs. This dataset aligns with the preferences of professional psychotherapists, providing a robust foundation for evaluating and improving LLMs in psycho-counseling. Experiments on reward modeling and preference learning demonstrate that PsychoCounsel-Preference is an excellent resource for LLMs to acquire essential skills for responding to clients in a counseling session. Our best-aligned model, PsychoCounsel-Llama3-8B, achieves an impressive win rate of 87% against GPT-4o. We release PsychoCounsel-Preference, PsychoCounsel-Llama3-8B and the reward model PsychoCounsel Llama3-8B-Reward to facilitate the research of psycho-counseling with LLMs at: https://hf.co/Psychotherapy-LLM.

選好学習がLLMの心理カウンセリング能力を解き放つ

Preference Learning Unlocks LLMs' Psycho-Counseling Skills

要旨

Support