通过策略竞标实现小型智能体规模化运营
Scaling Small Agents Through Strategy Auctions
February 2, 2026
作者: Lisa Alazraki, William F. Shen, Yoram Bachrach, Akhil Mathur
cs.AI
摘要
小型语言模型正日益被视为一种经济高效的智能体AI实现路径,支持者声称其已具备胜任智能体工作流的能力。然而尽管小型智能体在简单任务上能媲美大型模型,但其性能如何随任务复杂度扩展、何时需启用大型模型、以及如何更有效利用小型智能体处理长周期工作负载等问题仍不明确。本研究通过实证表明,在深度搜索和编程任务中,小型智能体的性能无法随任务复杂度有效扩展,同时我们提出受自由职业市场启发的智能体框架SALE(基于策略竞标的工作负载优化系统)。该框架使智能体通过简短策略方案参与竞标,由系统性成本价值机制进行评分,并借助共享竞标记忆库持续优化策略,实现按任务动态路由和持续自我改进,而无需训练独立路由模块或运行全部模型至终态。在复杂度各异的深度搜索与编程任务中,SALE将最大型智能体的调用需求降低53%,总成本减少35%,并在仅增加可忽略的执行开销前提下,持续超越最大型智能体的pass@1指标。相比之下,依赖任务描述的现有路由方案要么性能逊于最大型智能体,要么无法降低成本——往往两者兼有——这凸显其与智能体工作流的不适配性。这些结果表明,虽然小型智能体可能难以独立应对复杂工作负载,但通过协同任务分配和测试时自我改进机制可实现有效"规模扩展"。更广泛而言,本研究倡导从系统层面审视智能体AI:性能提升不应仅依赖不断增大的单体模型,而更应通过市场启发的协同机制,将异构智能体组织成高效自适应的生态系统。
English
Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are sufficiently capable for agentic workflows. However, while smaller agents can closely match larger ones on simple tasks, it remains unclear how their performance scales with task complexity, when large models become necessary, and how to better leverage small agents for long-horizon workloads. In this work, we empirically show that small agents' performance fails to scale with task complexity on deep search and coding tasks, and we introduce Strategy Auctions for Workload Efficiency (SALE), an agent framework inspired by freelancer marketplaces. In SALE, agents bid with short strategic plans, which are scored by a systematic cost-value mechanism and refined via a shared auction memory, enabling per-task routing and continual self-improvement without training a separate router or running all models to completion. Across deep search and coding tasks of varying complexity, SALE reduces reliance on the largest agent by 53%, lowers overall cost by 35%, and consistently improves upon the largest agent's pass@1 with only a negligible overhead beyond executing the final trace. In contrast, established routers that rely on task descriptions either underperform the largest agent or fail to reduce cost -- often both -- underscoring their poor fit for agentic workflows. These results suggest that while small agents may be insufficient for complex workloads, they can be effectively "scaled up" through coordinated task allocation and test-time self-improvement. More broadly, they motivate a systems-level view of agentic AI in which performance gains come less from ever-larger individual models and more from market-inspired coordination mechanisms that organize heterogeneous agents into efficient, adaptive ecosystems.