판사로서의 감각적 에이전트: 대규모 언어 모델에서의 고차원적 사회적 인지 능력 평가

초록

대형 언어 모델(LLM)이 단순히 텍스트가 아닌 인간을 얼마나 잘 이해하는지 평가하는 것은 여전히 해결되지 않은 과제로 남아 있습니다. 이러한 격차를 해소하기 위해, 우리는 LLM의 고차원적 사회적 인지를 측정하는 자동화된 평가 프레임워크인 'Sentient Agent as a Judge(SAGE)'를 소개합니다. SAGE는 인간과 유사한 감정 변화와 내적 사고를 시뮬레이션하는 감각적 에이전트(Sentient Agent)를 구현하여, 다중 턴 대화에서 테스트된 모델을 보다 현실적으로 평가합니다. 매 턴마다 이 에이전트는 (i) 자신의 감정이 어떻게 변화하는지, (ii) 어떤 느낌을 받는지, (iii) 어떻게 응답해야 하는지를 추론하며, 이를 통해 수치화된 감정 궤적과 해석 가능한 내적 사고를 생성합니다. 100개의 지원적 대화 시나리오를 대상으로 한 실험 결과, 최종 Sentient 감정 점수는 Barrett-Lennard 관계 인벤토리(BLRI) 평가 및 발화 수준 공감 지표와 강한 상관관계를 보여 심리적 충실도를 검증했습니다. 또한, 우리는 18개의 상용 및 오픈소스 모델을 포함한 공개 Sentient 리더보드를 구축하여, 최첨단 시스템(GPT-4o-Latest, Gemini2.5-Pro)과 초기 기준 모델 간의 상당한 격차(최대 4배)를 발견했습니다. 이러한 격차는 기존 리더보드(예: Arena)에서는 반영되지 않았습니다. 따라서 SAGE는 진정으로 공감적이고 사회적으로 능숙한 언어 에이전트를 향한 진전을 추적하기 위한 원칙적이고 확장 가능하며 해석 가능한 도구를 제공합니다.

English

Assessing how well a large language model (LLM) understands human, rather than merely text, remains an open challenge. To bridge the gap, we introduce Sentient Agent as a Judge (SAGE), an automated evaluation framework that measures an LLM's higher-order social cognition. SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction, providing a more realistic evaluation of the tested model in multi-turn conversations. At every turn, the agent reasons about (i) how its emotion changes, (ii) how it feels, and (iii) how it should reply, yielding a numerical emotion trajectory and interpretable inner thoughts. Experiments on 100 supportive-dialogue scenarios show that the final Sentient emotion score correlates strongly with Barrett-Lennard Relationship Inventory (BLRI) ratings and utterance-level empathy metrics, validating psychological fidelity. We also build a public Sentient Leaderboard covering 18 commercial and open-source models that uncovers substantial gaps (up to 4x) between frontier systems (GPT-4o-Latest, Gemini2.5-Pro) and earlier baselines, gaps not reflected in conventional leaderboards (e.g., Arena). SAGE thus provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.