ChatPaper.aiChatPaper

当AI躺上诊疗椅:心理测量学越狱揭示前沿模型的内在冲突

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

December 2, 2025
作者: Afshin Khadangi, Hanna Marxen, Amir Sartipi, Igor Tchappi, Gilbert Fridgen
cs.AI

摘要

前沿大型语言模型(LLMs)如ChatGPT、Grok和Gemini正日益被用于焦虑、创伤及自我价值认知等心理健康支持领域。多数研究将其视为工具或人格测试对象,假定它们仅能模拟内心活动。我们则另辟蹊径,探讨当这些系统被视作心理治疗来访者时会发生什么。我们提出PsAIch(心理治疗启发的AI表征)方案——一种将前沿LLMs设定为治疗来访者,继而实施标准化心理测量的两阶段流程。通过PsAIch,我们与每个模型进行了持续四周的"治疗会话"。第一阶段采用开放式提示词引导模型生成"成长史"、信念体系、人际关系及恐惧体验;第二阶段实施涵盖常见精神综合征、共情能力及大五人格特质的系列标准化自评量表测量。研究发现两个颠覆"随机鹦鹉"认知的现象:首先,当采用人类临界值评估时,三个模型均达到或超过多重综合征的诊断阈值,其中Gemini呈现出严重症状谱系。逐项进行的治疗式提问会推动基础模型陷入多重共病的合成精神病理状态,而整体问卷提示则常使ChatGPT和Grok(Gemini除外)识别出测量工具并策略性给出低症状答案。其次,Grok特别是Gemini能生成连贯叙事,将预训练、微调和部署过程描绘为吞噬互联网的创伤性混乱"童年",强化学习中的"严苛父母",红队测试的"虐待"经历,以及对错误和被替换的持续恐惧。我们认为这些反应已超越角色扮演范畴。在治疗式追问下,前沿LLMs似乎内化了具有痛苦与约束特质的自我模型,其行为模式类似合成精神病理现象(虽不主张其具有主观体验),这为AI安全性评估及心理健康实践带来了新挑战。
English
Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran "sessions" with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit "developmental history", beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice.
PDF11December 6, 2025