磁力市场：一个用于研究智能代理市场的开源环境

摘要

随着大语言模型智能体（LLM agents）的发展，它们正日益代表用户在从产品发现到交易完成的经济决策中扮演中介角色。这类应用虽前景可期，却也引发了关于智能体问责制与用户价值的诸多疑问。要解答这些问题，需深入理解智能体在真实市场环境中的行为模式。然而，现有研究大多在受限场景中对智能体进行评估，例如单一任务市场（如谈判）或结构化的双智能体交互。现实市场存在本质差异：它要求智能体处理多样化的经济活动，并在由行为不透明的多智能体参与、可进行开放式对话的大型动态生态系统中实现协同。为弥合这一差距，我们研究由代表消费者的助手智能体（Assistant agents）与代表竞争商家的服务智能体（Service agents）构成的双边智能体市场。为安全研究此类交互，我们开发了Magentic-Marketplace模拟环境，使助手与服务智能体得以运行其中。该环境使我们能够研究关键市场动态：智能体实现的效用、行为偏差、受操纵的脆弱性以及搜索机制如何影响市场结果。实验表明，前沿模型仅能在理想搜索条件下接近最优福利水平。随着规模扩大，性能急剧下降，且所有模型均表现出严重的"首提案偏好"，导致响应速度的收益达到质量收益的10-30倍。这些发现揭示了不同市场条件下行为模式的涌现机制，为设计公平高效的智能体市场提供了重要参考。

English

As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also raise many questions about agent accountability and value for users. Addressing these questions requires understanding how agents behave in realistic market conditions. However, previous research has largely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation) or structured two-agent interactions. Real-world markets are fundamentally different: they require agents to handle diverse economic activities and coordinate within large, dynamic ecosystems where multiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, we investigate two-sided agentic marketplaces where Assistant agents represent consumers and Service agents represent competing businesses. To study these interactions safely, we develop Magentic-Marketplace-- a simulated environment where Assistants and Services can operate. This environment enables us to study key market dynamics: the utility agents achieve, behavioral biases, vulnerability to manipulation, and how search mechanisms shape market outcomes. Our experiments show that frontier models can approach optimal welfare-- but only under ideal search conditions. Performance degrades sharply with scale, and all models exhibit severe first-proposal bias, creating 10-30x advantages for response speed over quality. These findings reveal how behaviors emerge across market conditions, informing the design of fair and efficient agentic marketplaces.