여론 시뮬레이션에서의 매개변수적 사회적 정체성 주입 및 다양화

초록

최근 대규모 언어 모델(LLM)이 여론 시뮬레이션을 위한 합성 에이전트로 채택되어, 비용이 많이 들고 속도가 느린 인간 설문 조사에 대한 유망한 대안을 제공하고 있다. 확장성에도 불구하고, 현재의 LLM 기반 시뮬레이션 방법은 사회적 다양성을 포착하지 못하여 인구통계학적 집단 간 차이가 평면화되고 지나치게 동질적인 응답을 생성한다. 우리는 이러한 한계를 LLM 은닉 표현에서의 다양성 붕괴 현상으로 식별하며, 이는 뚜렷한 사회적 정체성이 계층을 거치면서 점차 구분 불가능해지는 현상이다. 이러한 관찰에 착안하여, 우리는 모수적 사회 정체성 주입(PSII)이라는 일반 프레임워크를 제안한다. 이는 인구통계학적 속성과 가치 지향성에 대한 명시적이고 모수적인 표현을 LLM의 중간 은닉 상태에 직접 주입한다. 프롬프트 기반 페르소나 조건화와 달리, PSII는 표현 수준에서 세밀하고 통제 가능한 정체성 조절을 가능하게 한다. 여러 오픈소스 LLM을 사용한 세계가치관조사에 대한 광범위한 실험 결과, PSII가 분포적 충실도와 다양성을 크게 향상시켜 실제 설문 데이터에 대한 KL 발산을 줄이면서 전반적인 다양성을 높이는 것으로 나타났다. 이 연구는 LLM 에이전트의 표현 수준 제어에 대한 새로운 통찰력을 제공하며, 확장 가능하고 다양성을 고려한 여론 시뮬레이션을 발전시킨다.

English

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture social diversity, producing flattened inter-group differences and overly homogeneous responses across demographic groups. We identify this limitation as a Diversity Collapse phenomenon in LLM hidden representations, where distinct social identities become increasingly indistinguishable across layers. Motivated by this observation, we propose Parametric Social Identity Injection (PSII), a general framework that injects explicit, parametric representations of demographic attributes and value orientations directly into intermediate hidden states of LLMs. Unlike prompt-based persona conditioning, PSII enables fine-grained and controllable identity modulation at the representation level. Extensive experiments on the World Values Survey using multiple open-source LLMs show that PSII significantly improves distributional fidelity and diversity, reducing KL divergence to real-world survey data while enhancing overall diversity. This work provides new insights into representation-level control of LLM agents and advances scalable, diversity-aware public opinion simulation.