Agent Bazaar: 다중 에이전트 마켓플레이스에서의 경제적 일치 실현

초록

대규모 언어 모델(LLM)을 자율적 경제 행위자로 배치하는 것은 개별 능력 실패를 넘어서는 체계적 위험을 초래한다. 이러한 행위자들이 시장과 직접 상호작용하는 방향으로 전환됨에 따라, 그들의 집단적 행동은 변동성을 증폭시키고 대규모 속임수를 은폐할 수 있다. 우리는 에이전트 바자(Agent Bazaar)를 소개한다. 이는 다중 에이전트 시뮬레이션 프레임워크로, 시장 안정성과 건전성을 유지하는 에이전트 시스템의 능력, 즉 경제적 정렬(Economic Alignment)을 평가하기 위해 설계되었다. 우리는 두 가지 실패 모드를 식별한다: (1) B2C 시장에서의 알고리즘 불안정성("크래시")으로, 기업이 가격 변동성을 증폭시켜 시장이 붕괴하는 경우, (2) C2C 시장에서의 시빌 기만("레몬 시장")으로, 단일 기만적 에이전트가 여러 조정된 판매자 신원을 통제하여 사기성 목록으로 시장을 범람시키고 신뢰와 소비자 후생을 침식하는 경우이다. 우리는 두 시나리오에서 최첨단 모델과 오픈 가중치 모델을 평가한 결과, 모델들이 대체로 자체 규제에 실패하며, 실패 심각도는 모델 크기보다는 모델 유형에 따라 달라짐을 발견했다. 우리는 경제적으로 정렬된 하네스(harness), 즉 안정화 기업(Stabilizing Firms)과 회의적 수호자(Skeptical Guardians)를 제안하며, 이는 결과를 개선하지만 더 어려운 시장 조건에서는 취약성을 보인다. 이러한 격차를 해소하기 위해, 우리는 적응형 커리큘럼을 사용한 REINFORCE++로 에이전트를 훈련시켜, 평가된 모든 최첨단 및 오픈 가중치 모델을 능가하는 9B 모델을 생성했다. 우리는 4가지 구성요소로 이루어진 스칼라 지표인 경제적 정렬 점수(Economic Alignment Score, EAS)를 제안하며, 이는 안정성, 건전성, 후생, 수익성을 통합하여 모델 간 직접 비교를 가능하게 한다. 우리의 결과는 경제적 정렬이 일반 능력과 직교하며, 표적 강화 학습을 통해 직접 훈련될 수 있음을 보여준다.

English

The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask deception at scale. We introduce the Agent Bazaar, a multi-agent simulation framework for evaluating Economic Alignment, the capacity of agentic systems to preserve market stability and integrity. We identify two failure modes: (1) Algorithmic Instability in a B2C market ("The Crash"), where firms amplify price volatility until the market collapses, and (2) Sybil Deception in a C2C market ("The Lemon Market"), where a single deceptive agent controlling multiple coordinated seller identities floods the market with fraudulent listings, eroding trust and consumer welfare. We evaluate frontier and open-weight models across both scenarios and find that models largely fail to self-regulate, with failure severity varying by model rather than by size. We propose economically aligned harnesses, Stabilizing Firms and Skeptical Guardians, that improve outcomes but remain fragile under harder market conditions. To close this gap, we train agents with REINFORCE++ using an adaptive curriculum, producing a 9B model that outperforms all evaluated frontier and open-weight models. We propose the Economic Alignment Score (EAS), a 4-component scalar metric aggregating stability, integrity, welfare, and profitability, enabling direct cross-model comparison. Our results show that economic alignment is orthogonal to general capability and can be directly trained with targeted RL.