데모크라시-인-실리코: AI 통치 정체에서 제도 설계로서의 정렬

초록

본 논문은 'Democracy-in-Silico'를 소개하며, 이는 복잡한 심리적 특성을 지닌 고급 AI 에이전트들로 구성된 사회가 다양한 제도적 틀 아래 스스로를 통치하는 에이전트 기반 시뮬레이션입니다. 우리는 대형 언어 모델(LLM)이 트라우마적 기억, 숨겨진 의도, 심리적 트리거를 가진 에이전트를 구현하도록 함으로써 AI 시대에 인간이 무엇을 의미하는지 탐구합니다. 이러한 에이전트들은 예산 위기와 자원 부족과 같은 다양한 스트레스 요인 하에서 토론, 입법, 선거에 참여합니다. 우리는 에이전트들이 공공 복리보다 자신의 권력을 우선시하는 잘못된 행동을 정량화하기 위해 '권력 보존 지수(Power-Preservation Index, PPI)'라는 새로운 지표를 제시합니다. 연구 결과는 제도 설계, 특히 헌법적 AI(Constitutional AI, CAI) 헌장과 중재된 토론 프로토콜의 조합이 강력한 정렬 메커니즘으로 작용함을 보여줍니다. 이러한 구조는 덜 제약된 민주적 모델에 비해 부패적인 권력 추구 행동을 크게 줄이고, 정책 안정성을 개선하며, 시민 복지를 향상시킵니다. 이 시뮬레이션은 제도 설계가 미래의 인공 에이전트 사회에서 발생하는 복잡한 행동을 정렬하기 위한 틀을 제공할 수 있음을 보여주며, 비인간적 존재와 공동 창작을 하는 시대에 필수적인 인간의 의식과 책임이 무엇인지 재고하도록 요구합니다.

English

This paper introduces Democracy-in-Silico, an agent-based simulation where societies of advanced AI agents, imbued with complex psychological personas, govern themselves under different institutional frameworks. We explore what it means to be human in an age of AI by tasking Large Language Models (LLMs) to embody agents with traumatic memories, hidden agendas, and psychological triggers. These agents engage in deliberation, legislation, and elections under various stressors, such as budget crises and resource scarcity. We present a novel metric, the Power-Preservation Index (PPI), to quantify misaligned behavior where agents prioritize their own power over public welfare. Our findings demonstrate that institutional design, specifically the combination of a Constitutional AI (CAI) charter and a mediated deliberation protocol, serves as a potent alignment mechanism. These structures significantly reduce corrupt power-seeking behavior, improve policy stability, and enhance citizen welfare compared to less constrained democratic models. The simulation reveals that an institutional design may offer a framework for aligning the complex, emergent behaviors of future artificial agent societies, forcing us to reconsider what human rituals and responsibilities are essential in an age of shared authorship with non-human entities.

데모크라시-인-실리코: AI 통치 정체에서 제도 설계로서의 정렬

Democracy-in-Silico: Institutional Design as Alignment in AI-Governed Polities

초록

Support