舆情模拟中的参数化社会身份注入与多样化

摘要

大型语言模型（LLMs）近期被用作舆论模拟的合成代理，为成本高昂且速度缓慢的人类调查提供了一种有前景的替代方案。尽管具有可扩展性，当前基于LLM的模拟方法未能捕捉社会多样性，导致群体间差异扁平化，且不同人口群体的回答过度同质。我们将这一局限识别为LLM隐藏表示中的“多样性坍缩”现象——不同社会身份在逐层处理中逐渐变得难以区分。基于此观察，我们提出参数化社会身份注入（PSII），这是一个通用框架，能够直接将人口统计属性和价值取向的显式参数化表示注入LLM的中间隐藏状态。与基于提示的人物设定不同，PSII在表示层面实现了细粒度且可控的身份调制。在多项开源LLM上基于世界价值观调查的广泛实验表明，PSII显著提升了分布保真度和多样性，不仅降低了与真实世界调查数据的KL散度，还增强了整体多样性。该工作为LLM代理的表示层面控制提供了新见解，推动了可扩展、具有多样性意识的舆论模拟发展。

English

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture social diversity, producing flattened inter-group differences and overly homogeneous responses across demographic groups. We identify this limitation as a Diversity Collapse phenomenon in LLM hidden representations, where distinct social identities become increasingly indistinguishable across layers. Motivated by this observation, we propose Parametric Social Identity Injection (PSII), a general framework that injects explicit, parametric representations of demographic attributes and value orientations directly into intermediate hidden states of LLMs. Unlike prompt-based persona conditioning, PSII enables fine-grained and controllable identity modulation at the representation level. Extensive experiments on the World Values Survey using multiple open-source LLMs show that PSII significantly improves distributional fidelity and diversity, reducing KL divergence to real-world survey data while enhancing overall diversity. This work provides new insights into representation-level control of LLM agents and advances scalable, diversity-aware public opinion simulation.