De granulariteitsas: een micro-naar-macro latente richting voor sociale rollen in taalmodel(len)

Samenvatting

Grote taalmodellen (LLM's) worden regelmatig aangestuurd om sociale rollen aan te nemen die variëren van individuen tot instellingen, maar het is onduidelijk of hun interne representaties de granulariteit van dergelijke rollen coderen, van micro-level individuele ervaring tot macro-level organisatorisch, institutioneel of nationaal redeneren. Wij tonen aan dat dit het geval is. Wij definiëren een op contrast gebaseerde Granulariteitsas als het verschil tussen de gemiddelde verborgen toestanden van macro- en microrollen. In Qwen3-8B aligneert deze as met de principale as (PC1) van de rolrepresentatieruimte met een cosinus van 0.972 en verklaart deze 52,6% van de variantie, wat aangeeft dat granulariteit de dominante geometrische as is die de aangestuurde sociale rollen organiseert. Wij construeren 75 sociale rollen over vijf granulariteitsniveaus en verzamelen 91.200 rolgeconditioneerde antwoorden op gedeelde vragen en promptvarianten, extraheren vervolgens rol-level verborgen toestanden en projecteren deze op de as. Rolprojecties nemen monotoon toe over alle vijf niveaus, blijven stabiel over lagen, promptvarianten, eindpuntdefinities, weggelaten splitsingen en score-gefilterde subsets, en transfereren naar Llama-3.1-8B-Instruct. De as is ook causaal relevant: activatiesturing langs de as verschuift de responsgranulariteit in de voorspelde richting, waarbij Llama onder positieve sturing op prompts die lokale antwoorden toelaten, beweegt van 2.00 naar 3.17 op een vijfpuntenschaal voor macro. De twee modellen verschillen in bestuurbaarheid, wat suggereert dat sturing afhangt van het standaard operationele regime van elk model. Over het algemeen suggereren onze bevindingen dat sociale rolgranulariteit niet slechts een stilistisch oppervlaktekenmerk is, maar een gestructureerde, geordende en causaal manipuleerbare latente richting in rolgeconditioneerd taalmodelgedrag.

English

Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and project them onto the axis. Role projections increase monotonically across all five levels, remain stable across layers, prompt variants, endpoint definitions, held-out splits, and score-filtered subsets, and transfer to Llama-3.1-8B-Instruct. The axis is also causally relevant: activation steering along it shifts response granularity in the predicted direction, with Llama moving from 2.00 to 3.17 on a five-point macro scale under positive steering on prompts that admit local responses. The two models differ in controllability, suggesting that steering depends on each model's default operating regime. Overall, our findings suggest that social role granularity is not merely a stylistic surface feature, but a structured, ordered, and causally manipulable latent direction in role-conditioned language model behavior.

De granulariteitsas: een micro-naar-macro latente richting voor sociale rollen in taalmodel(len)

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

Samenvatting

Support