硅基民主:人工智能治理政体中的制度设计对齐
Democracy-in-Silico: Institutional Design as Alignment in AI-Governed Polities
August 27, 2025
作者: Trisanth Srinivasan, Santosh Patapati
cs.AI
摘要
本文介紹了“民主模擬”(Democracy-in-Silico),這是一種基於代理的模擬系統,其中由具備複雜心理特質的高級人工智能代理組成的社會,在不同的制度框架下進行自我治理。我們通過讓大型語言模型(LLMs)扮演具有創傷記憶、隱藏議程和心理觸發點的代理,探討在人工智能時代“何為人類”的意義。這些代理在預算危機和資源短缺等各種壓力下,參與審議、立法和選舉活動。我們提出了一種新指標——權力保存指數(Power-Preservation Index, PPI),用以量化代理將自身權力置於公共福祉之上的行為偏差。研究結果表明,制度設計,特別是結合憲法人工智能(Constitutional AI, CAI)章程和調解審議協議的設計,作為一種有效的對齊機制,相比於約束較少的民主模式,顯著減少了腐敗的權力追求行為,提升了政策穩定性,並改善了公民福祉。此模擬揭示,制度設計可能為未來人工代理社會中複雜且湧現的行為提供對齊框架,迫使我們重新思考在與非人類實體共同創作的時代,哪些人類儀式和責任是必不可少的。
English
This paper introduces Democracy-in-Silico, an agent-based simulation where
societies of advanced AI agents, imbued with complex psychological personas,
govern themselves under different institutional frameworks. We explore what it
means to be human in an age of AI by tasking Large Language Models (LLMs) to
embody agents with traumatic memories, hidden agendas, and psychological
triggers. These agents engage in deliberation, legislation, and elections under
various stressors, such as budget crises and resource scarcity. We present a
novel metric, the Power-Preservation Index (PPI), to quantify misaligned
behavior where agents prioritize their own power over public welfare. Our
findings demonstrate that institutional design, specifically the combination of
a Constitutional AI (CAI) charter and a mediated deliberation protocol, serves
as a potent alignment mechanism. These structures significantly reduce corrupt
power-seeking behavior, improve policy stability, and enhance citizen welfare
compared to less constrained democratic models. The simulation reveals that an
institutional design may offer a framework for aligning the complex, emergent
behaviors of future artificial agent societies, forcing us to reconsider what
human rituals and responsibilities are essential in an age of shared authorship
with non-human entities.