粒度轴:语言模型中社会角色的微观至宏观潜在维度
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models
May 7, 2026
作者: Chonghan Qin, Xiachong Feng, Ziyun Song, Xiaocheng Feng, Jing Xiong, Lingpeng Kong
cs.AI
摘要
大型语言模型(LLMs)常被要求扮演从个体到机构等不同社会角色,但其内部表征是否真正编码了此类角色的粒度差异——从微观层面的个体经验到宏观层面的组织、制度或国家层级的推理——仍不明确。本文通过实验证明这种粒度表征确实存在。我们定义了基于对比的粒度轴,即宏观与微观角色隐藏状态均值的差异。在Qwen3-8B模型中,该轴线与角色表征空间的主轴(PC1)呈余弦相似度0.972对齐,并解释了52.6%的方差,表明粒度是组织提示社会角色的主导几何轴线。我们构建了涵盖五个粒度等级的75种社会角色,通过共享问题和提示变体收集了91,200条角色条件化响应,进而提取角色级隐藏状态并投影至该轴线。角色投影在所有五个层级均呈现单调递增趋势,且在不同网络层、提示变体、端点定义、保留数据集和评分过滤子集中保持稳定,并可迁移至Llama-3.1-8B-Instruct模型。该轴线具有因果相关性:沿轴线进行激活导向会使响应粒度沿预测方向变化,在允许局部响应的提示下,Llama模型的宏观评分(五分量表)从2.00升至3.17。两模型在可控性上存在差异,表明导向效果取决于各模型的默认运作机制。总体而言,我们的发现表明社会角色粒度不仅是风格化的表面特征,更是角色条件化语言模型行为中具有结构化、有序性且可因果操控的潜在方向。
English
Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and project them onto the axis. Role projections increase monotonically across all five levels, remain stable across layers, prompt variants, endpoint definitions, held-out splits, and score-filtered subsets, and transfer to Llama-3.1-8B-Instruct. The axis is also causally relevant: activation steering along it shifts response granularity in the predicted direction, with Llama moving from 2.00 to 3.17 on a five-point macro scale under positive steering on prompts that admit local responses. The two models differ in controllability, suggesting that steering depends on each model's default operating regime. Overall, our findings suggest that social role granularity is not merely a stylistic surface feature, but a structured, ordered, and causally manipulable latent direction in role-conditioned language model behavior.