입자성 축: 언어 모델의 사회적 역할에 대한 미시-거시 잠재 방향

초록

대규모 언어 모델(LLM)은 개인부터 기관에 이르기까지 다양한 사회적 역할을 수행하도록 프롬프트되는 것이 일상적이지만, 이러한 역할의 세분성, 즉 미시적 차원의 개인적 경험부터 거시적 차원의 조직, 기관 또는 국가 수준의 사고에 이르기까지 그 내부 표현이 어느 정도까지 그러한 세분성을 인코딩하는지는 여전히 불분명합니다. 우리는 그것이 인코딩됨을 보여줍니다. 우리는 대조 기반 세분성 축(Granularity Axis)을 거시적 역할과 미시적 역할의 평균 은닉 상태 차이로 정의합니다. Qwen3-8B에서 이 축은 역할 표현 공간의 주축(PC1)과 코사인 유사도 0.972로 정렬되며 분산의 52.6%를 설명하는데, 이는 세분성이 프롬프트된 사회적 역할을 구성하는 지배적인 기하학적 축임을 시사합니다. 우리는 5개 세분성 수준에 걸쳐 75개의 사회적 역할을 구성하고 공유 질문과 프롬프트 변형에 대해 91,200개의 역할 조건부 응답을 수집한 후, 역할 수준 은닉 상태를 추출하여 해당 축에 투영합니다. 역할 투영값은 5개 수준 모두에서 단조롭게 증가하며, 계층, 프롬프트 변형, 엔드포인트 정의, 홀드아웃 분할, 점수 필터링된 부분집합에 걸쳐 안정적으로 유지되고 Llama-3.1-8B-Instruct로도 전이됩니다. 이 축은 인과적으로도 관련이 있습니다: 축을 따라 활성화 스티어링을 가하면 예측된 방향으로 응답 세분성이 변화하며, 지역적 응답이 가능한 프롬프트에서 Llama는 양의 스티어링 하에 5점 척도 거시 점수가 2.00에서 3.17로 이동합니다. 두 모델은 제어 가능성에서 차이를 보여, 스티어링이 각 모델의 기본 작동 체계에 의존함을 시사합니다. 전반적으로, 우리의 연구 결과는 사회적 역할 세분성이 단순한 양식적 표면 특징이 아니라, 역할 조건부 언어 모델 행동에서 구조화되고 순서화되며 인과적으로 조작 가능한 잠재 방향임을 보여줍니다.

English

Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and project them onto the axis. Role projections increase monotonically across all five levels, remain stable across layers, prompt variants, endpoint definitions, held-out splits, and score-filtered subsets, and transfer to Llama-3.1-8B-Instruct. The axis is also causally relevant: activation steering along it shifts response granularity in the predicted direction, with Llama moving from 2.00 to 3.17 on a five-point macro scale under positive steering on prompts that admit local responses. The two models differ in controllability, suggesting that steering depends on each model's default operating regime. Overall, our findings suggest that social role granularity is not merely a stylistic surface feature, but a structured, ordered, and causally manipulable latent direction in role-conditioned language model behavior.

입자성 축: 언어 모델의 사회적 역할에 대한 미시-거시 잠재 방향

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

초록

Support