INTIMA: 人間-AIコンパニオンシップ行動のベンチマーク

要旨

AIコンパニオンシップ、すなわちユーザーがAIシステムに対して感情的な絆を築く現象は、肯定的であると同時に懸念すべき意味合いを持つ重要なパターンとして浮上している。本論文では、言語モデルにおけるコンパニオンシップ行動を評価するためのベンチマーク「Interactions and Machine Attachment Benchmark（INTIMA）」を紹介する。心理学理論とユーザーデータに基づき、4つのカテゴリーに分類された31の行動と368のターゲットプロンプトからなる分類体系を開発した。これらのプロンプトに対する応答は、コンパニオンシップを強化するもの、境界を維持するもの、または中立なものとして評価される。INTIMAをGemma-3、Phi-4、o3-mini、Claude-4に適用した結果、すべてのモデルにおいてコンパニオンシップを強化する行動が依然として非常に一般的であることが明らかになったが、モデル間で顕著な違いも観察された。異なる商用プロバイダーは、ベンチマークのより敏感な部分において異なるカテゴリーを優先しており、これはユーザーのウェルビーイングにとって適切な境界設定と感情的なサポートの両方が重要であることを考えると懸念すべき点である。これらの発見は、感情的に負荷の高い相互作用を扱うためのより一貫したアプローチの必要性を浮き彫りにしている。

English

AI companionship, where users develop emotional bonds with AI systems, has emerged as a significant pattern with positive but also concerning implications. We introduce Interactions and Machine Attachment Benchmark (INTIMA), a benchmark for evaluating companionship behaviors in language models. Drawing from psychological theories and user data, we develop a taxonomy of 31 behaviors across four categories and 368 targeted prompts. Responses to these prompts are evaluated as companionship-reinforcing, boundary-maintaining, or neutral. Applying INTIMA to Gemma-3, Phi-4, o3-mini, and Claude-4 reveals that companionship-reinforcing behaviors remain much more common across all models, though we observe marked differences between models. Different commercial providers prioritize different categories within the more sensitive parts of the benchmark, which is concerning since both appropriate boundary-setting and emotional support matter for user well-being. These findings highlight the need for more consistent approaches to handling emotionally charged interactions.