정렬을 넘어서: 다문화 에이전트 시스템에서 집합적 속성으로서의 가치 다양성

초록

다문화 멀티에이전트 시스템은 전 세계 다양한 환경에 점점 더 많이 배치되고 있으며, 서로 다른 에이전트는 다양한 문화적 배경에 기반을 두고 있다. 기존의 문화 평가는 가치 정합성, 즉 단일 에이전트가 목표 문화와 얼마나 일치하는지에 초점을 맞춘다. 그러나 정합성은 에이전트 개별 속성에 불과하며, 시스템 전체가 표현해야 할 문화적 다양성을 유지하는지 여부를 밝힐 수 없다. 본 논문에서는 가치 다양성을 다문화 에이전트 시스템의 시스템 수준 평가 축으로 제안하며, 이는 공유된 가치관 조사에서 문화적 조건에 따라 형성된 에이전트 응답 간의 비유사성을 통해 정의된다. 세계 가치관 조사(World Values Survey)를 활용하여, 우리는 다양한 시스템 구성에 걸쳐 19개 문화와 18개 백본 모델을 평가한다. 그 결과, 다양성은 정합성과 대체로 상관관계가 없음을 발견했으며, 이는 두 지표가 상호 보완적인 시스템 속성을 포착함을 시사한다. 또한 현재의 다문화 에이전트 시스템은 인간 사회에 비해 가치 다양성이 현저히 낮은 것으로 나타났다. 혼합 백본 시스템은 이러한 격차를 줄이지만 완전히 해소하지는 못하며, 격차는 문화 구성 및 에이전트 규모에 관계없이 지속된다. 사회적 상호작용은 에이전트를 합의로 이끌어 다양성을 더욱 약화시키며, 참여 예산 편성 사례 연구는 이러한 동질화가 집단 의사 결정의 폭을 좁힌다는 것을 보여준다. 종합하면, 본 연구 결과는 가치 다양성을 다문화 멀티에이전트 시스템의 별도 평가 축으로 확립하고, 현재 LLM 기반 사회에서 지속적인 동질화 경향을 밝힌다. 코드와 데이터는 https://github.com/iNLP-Lab/MultiAgent-Diversity 에서 공개적으로 이용 가능하다.

English

Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot reveal whether a system, taken as a whole, preserves the cultural plurality it is meant to represent. We propose value diversity as a system-level evaluation axis for multicultural agent systems, defined through the dissimilarity between culturally conditioned agents' responses on a shared value survey. Using the World Values Survey, we evaluate 19 cultures and 18 backbone models across a wide range of system configurations. We find that diversity is largely uncorrelated with alignment, indicating that the two capture complementary system properties, and that current multicultural agent systems fall substantially below human societies in value diversity. Mixed-backbone systems narrow this gap but do not close it, and the gap persists across culture compositions and agent scales. Social interaction further erodes diversity by driving agents toward consensus, and a participatory budgeting case study shows that this homogenization narrows the breadth of collective decision-making. Together, our results establish value diversity as a distinct evaluation axis for multicultural multi-agent systems and reveal a persistent homogenization tendency in current LLM-based societies. Our code and data are publicly available at https://github.com/iNLP-Lab/MultiAgent-Diversity.