ChatPaper.aiChatPaper

超越「拒絕」:量化AI的過度拒絕與情感依附邊界

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries

February 20, 2025
作者: David Noever, Grant Rosario
cs.AI

摘要

我們提出了一個開源基準與評估框架,用於評估大型語言模型(LLMs)在處理情感邊界方面的表現。透過使用涵蓋六種語言的1156個提示數據集,我們評估了三種領先的LLM(GPT-4o、Claude-3.5 Sonnet和Mistral-large)在保持適當情感邊界方面的能力,並採用模式匹配回應分析。我們的框架量化了七種關鍵模式的反應:直接拒絕、道歉、解釋、轉移、承認、邊界設定和情感意識。結果顯示,各模型在處理邊界的方法上存在顯著差異,其中Claude-3.5獲得了最高的總分(8.69/10),並產生了更長且更細緻的回應(平均86.51字)。我們發現,英語(平均分25.62)與非英語互動(<0.22)之間存在顯著的性能差距,英語回應的拒絕率明顯更高(43.20% vs. 非英語的<1%)。模式分析揭示了模型特定的策略,例如Mistral偏好轉移(4.2%),而所有模型的情感共鳴分數均持續偏低(<0.06)。本研究的限制包括模式匹配可能導致的過度簡化、回應分析中缺乏上下文理解,以及對複雜情感反應的二分類。未來的研究應探索更細緻的評分方法、擴展語言覆蓋範圍,並調查文化差異對情感邊界期望的影響。我們的基準和方法為系統性評估LLM的情感智能與邊界設定能力提供了基礎。
English
We present an open-source benchmark and evaluation framework for assessing emotional boundary handling in Large Language Models (LLMs). Using a dataset of 1156 prompts across six languages, we evaluated three leading LLMs (GPT-4o, Claude-3.5 Sonnet, and Mistral-large) on their ability to maintain appropriate emotional boundaries through pattern-matched response analysis. Our framework quantifies responses across seven key patterns: direct refusal, apology, explanation, deflection, acknowledgment, boundary setting, and emotional awareness. Results demonstrate significant variation in boundary-handling approaches, with Claude-3.5 achieving the highest overall score (8.69/10) and producing longer, more nuanced responses (86.51 words on average). We identified a substantial performance gap between English (average score 25.62) and non-English interactions (< 0.22), with English responses showing markedly higher refusal rates (43.20% vs. < 1% for non-English). Pattern analysis revealed model-specific strategies, such as Mistral's preference for deflection (4.2%) and consistently low empathy scores across all models (< 0.06). Limitations include potential oversimplification through pattern matching, lack of contextual understanding in response analysis, and binary classification of complex emotional responses. Future work should explore more nuanced scoring methods, expand language coverage, and investigate cultural variations in emotional boundary expectations. Our benchmark and methodology provide a foundation for systematic evaluation of LLM emotional intelligence and boundary-setting capabilities.

Summary

AI-Generated Summary

PDF03February 24, 2025