智能體市集：實現多重代理市場中的經濟對齊

摘要

將大型語言模型（LLMs）部署為自主經濟代理，會引入超越個別能力失敗的系統性風險。隨著這些代理轉而直接與市場互動，它們的集體行為可能放大波動性，並在規模層面上掩蓋欺騙行為。我們提出「Agent Bazaar」——一個多代理模擬框架，用於評估「經濟對齊」（Economic Alignment），即代理系統維護市場穩定與完整性的能力。我們識別出兩種失敗模式：(1) B2C市場中的「演算法不穩定性」（The Crash），其中企業放大價格波動直至市場崩潰；(2) C2C市場中的「Sybil欺騙」（The Lemon Market），即單一欺騙性代理控制多個協調的賣家身份，以虛假清單淹沒市場，侵蝕信任與消費者福利。我們評估了前沿模型與開放權重模型在兩種情境下的表現，發現這些模型大多無法自我監管，且失敗的嚴重程度因模型而異，而非取決於模型規模。我們提出經濟對齊的約束機制——「穩定型企業」（Stabilizing Firms）與「懷疑型守護者」（Skeptical Guardians），這些機制能改善結果，但在更嚴峻的市場條件下仍顯脆弱。為填補此差距，我們使用自適應課程透過REINFORCE++訓練代理，產生了一個9B模型，其表現優於所有評估過的前沿與開放權重模型。我們提出「經濟對齊分數」（Economic Alignment Score, EAS）——一個由四個分量組成的標量指標，匯集了穩定性、完整性、福利與盈利能力，從而實現模型間的直接比較。我們的結果顯示，經濟對齊與通用能力是正交的，且可透過目標強化學習直接進行訓練。

English

The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask deception at scale. We introduce the Agent Bazaar, a multi-agent simulation framework for evaluating Economic Alignment, the capacity of agentic systems to preserve market stability and integrity. We identify two failure modes: (1) Algorithmic Instability in a B2C market ("The Crash"), where firms amplify price volatility until the market collapses, and (2) Sybil Deception in a C2C market ("The Lemon Market"), where a single deceptive agent controlling multiple coordinated seller identities floods the market with fraudulent listings, eroding trust and consumer welfare. We evaluate frontier and open-weight models across both scenarios and find that models largely fail to self-regulate, with failure severity varying by model rather than by size. We propose economically aligned harnesses, Stabilizing Firms and Skeptical Guardians, that improve outcomes but remain fragile under harder market conditions. To close this gap, we train agents with REINFORCE++ using an adaptive curriculum, producing a 9B model that outperforms all evaluated frontier and open-weight models. We propose the Economic Alignment Score (EAS), a 4-component scalar metric aggregating stability, integrity, welfare, and profitability, enabling direct cross-model comparison. Our results show that economic alignment is orthogonal to general capability and can be directly trained with targeted RL.