CBT-Bench:评估大型语言模型在辅助认知行为疗法中的表现
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy
October 17, 2024
作者: Mian Zhang, Xianjun Yang, Xinlu Zhang, Travis Labrum, Jamie C. Chiu, Shaun M. Eack, Fei Fang, William Yang Wang, Zhiyu Zoey Chen
cs.AI
摘要
当今患者需求与现有心理健康支持之间存在显著差距。本文旨在深入探讨利用大型语言模型(LLMs)辅助专业心理治疗的潜力。为此,我们提出了一个新的基准,CBT-BENCH,用于系统评估认知行为疗法(CBT)辅助。CBT-BENCH包括三个任务级别:I:基础CBT知识获取,包括多项选择题任务;II:认知模型理解,包括认知扭曲分类、主要核心信念分类和细粒度核心信念分类任务;III:治疗响应生成,包括在CBT治疗会话中生成对患者言辞的回应任务。这些任务涵盖了CBT的关键方面,潜在地可以通过AI辅助进行增强,同时还勾勒了一套能力需求的层次结构,从基础知识背诵到参与真实治疗对话。我们在我们的基准上评估了代表性的LLMs。实验结果表明,虽然LLMs在背诵CBT知识方面表现良好,但在需要深入分析患者认知结构并生成有效回应的复杂现实场景中表现不佳,暗示了未来的潜在工作。
English
There is a significant gap between patient needs and available mental health
support today. In this paper, we aim to thoroughly examine the potential of
using Large Language Models (LLMs) to assist professional psychotherapy. To
this end, we propose a new benchmark, CBT-BENCH, for the systematic evaluation
of cognitive behavioral therapy (CBT) assistance. We include three levels of
tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of
multiple-choice questions; II: Cognitive model understanding, with the tasks of
cognitive distortion classification, primary core belief classification, and
fine-grained core belief classification; III: Therapeutic response generation,
with the task of generating responses to patient speech in CBT therapy
sessions. These tasks encompass key aspects of CBT that could potentially be
enhanced through AI assistance, while also outlining a hierarchy of capability
requirements, ranging from basic knowledge recitation to engaging in real
therapeutic conversations. We evaluated representative LLMs on our benchmark.
Experimental results indicate that while LLMs perform well in reciting CBT
knowledge, they fall short in complex real-world scenarios requiring deep
analysis of patients' cognitive structures and generating effective responses,
suggesting potential future work.Summary
AI-Generated Summary