偏好學習釋放大語言模型的心理諮詢潛能

摘要

將大型語言模型（LLMs）應用於心理諮詢輔助是一種新興且具有意義的方法，這主要是由於患者需求與心理健康支持可用性之間存在顯著差距。然而，目前的LLMs在對客戶言論提供有效回應方面仍存在困難，這很大程度上歸因於缺乏高質量真實心理諮詢數據的監督，這些數據的內容通常因客戶隱私問題而難以獲取。此外，現有諮詢會談中治療師回應的質量會因其專業培訓和經驗而顯著不同。評估治療師回應的質量仍是一個開放性挑戰。在本研究中，我們首先提出了一套專業且全面的原則來評估治療師對客戶言論的回應，以此應對這些挑戰。基於這些原則，我們創建了一個偏好數據集PsychoCounsel-Preference，其中包含36k個高質量的偏好比較對。該數據集與專業心理治療師的偏好保持一致，為評估和改進LLMs在心理諮詢中的應用提供了堅實基礎。在獎勵建模和偏好學習方面的實驗表明，PsychoCounsel-Preference是LLMs獲取諮詢會談中回應客戶所需關鍵技能的優秀資源。我們的最佳對齊模型PsychoCounsel-Llama3-8B在與GPT-4o的對比中取得了87%的驚人勝率。我們發布了PsychoCounsel-Preference、PsychoCounsel-Llama3-8B以及獎勵模型PsychoCounsel Llama3-8B-Reward，以促進LLMs在心理諮詢領域的研究，詳見：https://hf.co/Psychotherapy-LLM。

English

Applying large language models (LLMs) to assist in psycho-counseling is an emerging and meaningful approach, driven by the significant gap between patient needs and the availability of mental health support. However, current LLMs struggle to consistently provide effective responses to client speeches, largely due to the lack of supervision from high-quality real psycho-counseling data, whose content is typically inaccessible due to client privacy concerns. Furthermore, the quality of therapists' responses in available sessions can vary significantly based on their professional training and experience. Assessing the quality of therapists' responses remains an open challenge. In this work, we address these challenges by first proposing a set of professional and comprehensive principles to evaluate therapists' responses to client speeches. Using these principles, we create a preference dataset, PsychoCounsel-Preference, which contains 36k high-quality preference comparison pairs. This dataset aligns with the preferences of professional psychotherapists, providing a robust foundation for evaluating and improving LLMs in psycho-counseling. Experiments on reward modeling and preference learning demonstrate that PsychoCounsel-Preference is an excellent resource for LLMs to acquire essential skills for responding to clients in a counseling session. Our best-aligned model, PsychoCounsel-Llama3-8B, achieves an impressive win rate of 87% against GPT-4o. We release PsychoCounsel-Preference, PsychoCounsel-Llama3-8B and the reward model PsychoCounsel Llama3-8B-Reward to facilitate the research of psycho-counseling with LLMs at: https://hf.co/Psychotherapy-LLM.

偏好學習釋放大語言模型的心理諮詢潛能

Preference Learning Unlocks LLMs' Psycho-Counseling Skills

摘要

Support