EthicsMH:心理健康AI伦理推理的试点基准
EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI
September 15, 2025
作者: Sai Kartheek Reddy Kasu
cs.AI
摘要
在心理健康及其他敏感领域部署大型语言模型(LLMs)引发了关于伦理推理、公平性及责任对齐的紧迫问题。然而,现有的道德与临床决策基准未能充分涵盖心理健康实践中遇到的独特伦理困境,其中保密性、自主性、行善原则与偏见常常交织。为填补这一空白,我们推出了“心理健康中的伦理推理”(EthicsMH),这是一个包含125个场景的试点数据集,旨在评估AI系统如何在治疗与精神病学背景下应对充满伦理挑战的情境。每个场景均配有结构化字段,包括多项决策选项、专家一致认可的推理、预期模型行为、现实世界影响及多方利益相关者观点。这一结构不仅支持决策准确性的评估,还涵盖解释质量及与专业规范的契合度。尽管规模有限且借助模型辅助生成,EthicsMH建立了一个连接AI伦理与心理健康决策的任务框架。通过发布此数据集,我们旨在提供一个可经由社区与专家贡献扩展的种子资源,促进开发能够负责任处理社会最微妙决策的AI系统。
English
The deployment of large language models (LLMs) in mental health and other
sensitive domains raises urgent questions about ethical reasoning, fairness,
and responsible alignment. Yet, existing benchmarks for moral and clinical
decision-making do not adequately capture the unique ethical dilemmas
encountered in mental health practice, where confidentiality, autonomy,
beneficence, and bias frequently intersect. To address this gap, we introduce
Ethical Reasoning in Mental Health (EthicsMH), a pilot dataset of 125 scenarios
designed to evaluate how AI systems navigate ethically charged situations in
therapeutic and psychiatric contexts. Each scenario is enriched with structured
fields, including multiple decision options, expert-aligned reasoning, expected
model behavior, real-world impact, and multi-stakeholder viewpoints. This
structure enables evaluation not only of decision accuracy but also of
explanation quality and alignment with professional norms. Although modest in
scale and developed with model-assisted generation, EthicsMH establishes a task
framework that bridges AI ethics and mental health decision-making. By
releasing this dataset, we aim to provide a seed resource that can be expanded
through community and expert contributions, fostering the development of AI
systems capable of responsibly handling some of society's most delicate
decisions.