大语言模型会“感受”吗?情感回路的发现与控制
Do LLMs "Feel"? Emotion Circuits Discovery and Control
October 13, 2025
作者: Chenxi Wang, Yixuan Zhang, Ruiji Yu, Yufei Zheng, Lang Gao, Zirui Song, Zixiang Xu, Gus Xia, Huishuai Zhang, Dongyan Zhao, Xiuying Chen
cs.AI
摘要
随着大型语言模型(LLMs)对情感智能需求的增长,核心挑战在于理解引发情感表达的内部机制以及控制生成文本中的情感。本研究聚焦三个核心问题:(1)LLMs是否包含塑造情感表达的上下文无关机制?(2)这些机制的具体形态是什么?(3)能否利用这些机制实现普适的情感控制?我们首先构建了一个受控数据集SEV(带有情感效价的情境事件),以激发跨情感的可比内部状态。随后,我们提取了揭示情感跨上下文一致编码的上下文无关情感方向(问题1)。通过解析分解与因果分析,我们识别了局部执行情感计算的神经元与注意力头,并通过消融与增强干预验证了它们的因果作用。接着,我们量化了各子层对模型最终情感表征的因果影响,并将识别出的局部组件整合为驱动情感表达的全局情感回路(问题2)。直接调控这些回路在测试集上实现了99.65%的情感表达准确率,超越了基于提示与导向的方法(问题3)。据我们所知,这是首次系统性地揭示并验证LLMs中情感回路的研究,为可解释性与可控情感智能提供了新见解。
English
As the demand for emotional intelligence in large language models (LLMs)
grows, a key challenge lies in understanding the internal mechanisms that give
rise to emotional expression and in controlling emotions in generated text.
This study addresses three core questions: (1) Do LLMs contain context-agnostic
mechanisms shaping emotional expression? (2) What form do these mechanisms
take? (3) Can they be harnessed for universal emotion control? We first
construct a controlled dataset, SEV (Scenario-Event with Valence), to elicit
comparable internal states across emotions. Subsequently, we extract
context-agnostic emotion directions that reveal consistent, cross-context
encoding of emotion (Q1). We identify neurons and attention heads that locally
implement emotional computation through analytical decomposition and causal
analysis, and validate their causal roles via ablation and enhancement
interventions. Next, we quantify each sublayer's causal influence on the
model's final emotion representation and integrate the identified local
components into coherent global emotion circuits that drive emotional
expression (Q2). Directly modulating these circuits achieves 99.65%
emotion-expression accuracy on the test set, surpassing prompting- and
steering-based methods (Q3). To our knowledge, this is the first systematic
study to uncover and validate emotion circuits in LLMs, offering new insights
into interpretability and controllable emotional intelligence.