粒度軸：言語モデルにおける社会的役割の微視的から巨視的への潜在的方向性

要旨

大規模言語モデル（LLM）は、個人から機関に至るまでの社会的役割を担うよう日常的にプロンプト入力されているが、その内部表現が、ミクロレベルな個人の経験からマクロレベルな組織・制度・国家の推論に至るような、こうした役割の粒度をどこまで符号化しているかは明らかでない。我々は、それが符号化されていることを示す。我々は、平均的なマクロ役割とミクロ役割の隠れ状態の差として、対比に基づく粒度軸を定義する。Qwen3-8Bにおいて、この軸は役割表現空間の第一主成分（PC1）とコサイン類似度0.972で一致し、その分散の52.6%を説明する。これは、粒度がプロンプトによって与えられた社会的役割を構造化する支配的な幾何学的軸であることを示唆している。我々は5つの粒度レベルにわたって75の社会的役割を構築し、共通の質問とプロンプト変種に対して91,200の役割条件付き応答を収集した。その後、役労レベルの隠れ状態を抽出し、これを軸上に射影した。役割の射影値は5つのレベル全てで単調増加し、層、プロンプト変種、端点の定義、ホールドアウト分割、スコアフィルタリングされた部分集合において安定しており、Llama-3.1-8B-Instructへも転移した。この軸は因果的関連性も有する：軸に沿った活性化ステアリングにより、応答の粒度は予測された方向にシフトし、局所的な応答が可能なプロンプトに対して正のステアリングを施した場合、Llamaは5段階のマクロ尺度で2.00から3.17へと移動した。二つのモデルは制御可能性において異なり、ステアリングが各モデルのデフォルトの動作レジームに依存することを示唆している。全体として、我々の発見は、社会的役割の粒度が単なる表面的な様式的特徴ではなく、役割条件付き言語モデルの振る舞いにおいて、構造化され、順序付けられ、因果的に操作可能な潜在的方向であることを示唆している。

English

Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and project them onto the axis. Role projections increase monotonically across all five levels, remain stable across layers, prompt variants, endpoint definitions, held-out splits, and score-filtered subsets, and transfer to Llama-3.1-8B-Instruct. The axis is also causally relevant: activation steering along it shifts response granularity in the predicted direction, with Llama moving from 2.00 to 3.17 on a five-point macro scale under positive steering on prompts that admit local responses. The two models differ in controllability, suggesting that steering depends on each model's default operating regime. Overall, our findings suggest that social role granularity is not merely a stylistic surface feature, but a structured, ordered, and causally manipulable latent direction in role-conditioned language model behavior.

粒度軸：言語モデルにおける社会的役割の微視的から巨視的への潜在的方向性

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

要旨

Support