EthicsMH:心理健康AI倫理推理的試點基準
EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI
September 15, 2025
作者: Sai Kartheek Reddy Kasu
cs.AI
摘要
在心理健康及其他敏感領域部署大型語言模型(LLMs)引發了關於倫理推理、公平性及責任對齊的迫切問題。然而,現有的道德與臨床決策基準並未充分涵蓋心理健康實踐中獨特的倫理困境,其中保密性、自主性、行善原則與偏見經常交織。為填補這一空白,我們推出了《心理健康中的倫理推理》(EthicsMH),這是一個包含125個情境的試點數據集,旨在評估AI系統如何在治療與精神科背景下處理涉及倫理的複雜情況。每個情境均配備了結構化字段,包括多種決策選項、專家對齊的推理、預期模型行為、現實世界影響及多利益相關者觀點。此結構不僅能評估決策的準確性,還能評估解釋質量與專業規範的對齊程度。儘管規模適中且借助模型輔助生成,EthicsMH建立了一個橋接AI倫理與心理健康決策的任務框架。通過發布此數據集,我們旨在提供一個可通過社群與專家貢獻擴展的種子資源,促進開發能夠負責任地處理社會最微妙決策的AI系統。
English
The deployment of large language models (LLMs) in mental health and other
sensitive domains raises urgent questions about ethical reasoning, fairness,
and responsible alignment. Yet, existing benchmarks for moral and clinical
decision-making do not adequately capture the unique ethical dilemmas
encountered in mental health practice, where confidentiality, autonomy,
beneficence, and bias frequently intersect. To address this gap, we introduce
Ethical Reasoning in Mental Health (EthicsMH), a pilot dataset of 125 scenarios
designed to evaluate how AI systems navigate ethically charged situations in
therapeutic and psychiatric contexts. Each scenario is enriched with structured
fields, including multiple decision options, expert-aligned reasoning, expected
model behavior, real-world impact, and multi-stakeholder viewpoints. This
structure enables evaluation not only of decision accuracy but also of
explanation quality and alignment with professional norms. Although modest in
scale and developed with model-assisted generation, EthicsMH establishes a task
framework that bridges AI ethics and mental health decision-making. By
releasing this dataset, we aim to provide a seed resource that can be expanded
through community and expert contributions, fostering the development of AI
systems capable of responsibly handling some of society's most delicate
decisions.