CBT-Bench：评估大型语言模型在辅助认知行为疗法中的表现

摘要

当今患者需求与现有心理健康支持之间存在显著差距。本文旨在深入探讨利用大型语言模型（LLMs）辅助专业心理治疗的潜力。为此，我们提出了一个新的基准，CBT-BENCH，用于系统评估认知行为疗法（CBT）辅助。CBT-BENCH包括三个任务级别：I：基础CBT知识获取，包括多项选择题任务；II：认知模型理解，包括认知扭曲分类、主要核心信念分类和细粒度核心信念分类任务；III：治疗响应生成，包括在CBT治疗会话中生成对患者言辞的回应任务。这些任务涵盖了CBT的关键方面，潜在地可以通过AI辅助进行增强，同时还勾勒了一套能力需求的层次结构，从基础知识背诵到参与真实治疗对话。我们在我们的基准上评估了代表性的LLMs。实验结果表明，虽然LLMs在背诵CBT知识方面表现良好，但在需要深入分析患者认知结构并生成有效回应的复杂现实场景中表现不佳，暗示了未来的潜在工作。

English

There is a significant gap between patient needs and available mental health support today. In this paper, we aim to thoroughly examine the potential of using Large Language Models (LLMs) to assist professional psychotherapy. To this end, we propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance. We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions. These tasks encompass key aspects of CBT that could potentially be enhanced through AI assistance, while also outlining a hierarchy of capability requirements, ranging from basic knowledge recitation to engaging in real therapeutic conversations. We evaluated representative LLMs on our benchmark. Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios requiring deep analysis of patients' cognitive structures and generating effective responses, suggesting potential future work.

CBT-Bench：评估大型语言模型在辅助认知行为疗法中的表现

CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy

摘要

Support