通过策略拍卖实现小型智能体规模化扩展
Scaling Small Agents Through Strategy Auctions
February 2, 2026
作者: Lisa Alazraki, William F. Shen, Yoram Bachrach, Akhil Mathur
cs.AI
摘要
小型语言模型正日益被视为一种具有前景且成本效益高的智能体AI实现路径,支持者认为其已具备胜任智能体工作流的能力。然而,尽管小型智能体在简单任务上能接近大型模型的表现,但其性能如何随任务复杂度扩展、何时需启用大型模型、以及如何更有效利用小型智能体处理长周期工作负载等问题仍不明确。本研究通过实证表明,在深度搜索和编程任务中,小型智能体的性能无法随任务复杂度有效扩展,并在此基础上提出受自由职业市场启发的智能体框架——工作负载效率策略竞拍(SALE)。该框架使智能体通过简短策略方案参与竞标,由系统性成本价值机制评分,并借助共享竞拍记忆进行策略优化,从而实现按任务路由的智能调度和持续自我改进,且无需训练独立路由模块或完整运行所有模型。在复杂度各异的深度搜索和编程任务测试中,SALE将最大型智能体的调用需求降低53%,总成本减少35%,并在仅增加可忽略的执行开销前提下,持续超越最大型智能体的pass@1指标。相比之下,依赖任务描述的现有路由方案要么性能不及最大型智能体,要么无法降低成本——往往两者兼有——这凸显其与智能体工作流的适配不足。研究结果表明,虽然小型智能体可能难以独立应对复杂工作负载,但通过协同任务分配和测试时自我改进机制可实现有效"规模扩展"。更广泛而言,这推动我们以系统级视角审视智能体AI:性能提升不应仅依赖于持续扩大的单体模型,而更应源自受市场启发的协同机制——将异构智能体组织成高效、自适应的生态系统。
English
Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are sufficiently capable for agentic workflows. However, while smaller agents can closely match larger ones on simple tasks, it remains unclear how their performance scales with task complexity, when large models become necessary, and how to better leverage small agents for long-horizon workloads. In this work, we empirically show that small agents' performance fails to scale with task complexity on deep search and coding tasks, and we introduce Strategy Auctions for Workload Efficiency (SALE), an agent framework inspired by freelancer marketplaces. In SALE, agents bid with short strategic plans, which are scored by a systematic cost-value mechanism and refined via a shared auction memory, enabling per-task routing and continual self-improvement without training a separate router or running all models to completion. Across deep search and coding tasks of varying complexity, SALE reduces reliance on the largest agent by 53%, lowers overall cost by 35%, and consistently improves upon the largest agent's pass@1 with only a negligible overhead beyond executing the final trace. In contrast, established routers that rely on task descriptions either underperform the largest agent or fail to reduce cost -- often both -- underscoring their poor fit for agentic workflows. These results suggest that while small agents may be insufficient for complex workloads, they can be effectively "scaled up" through coordinated task allocation and test-time self-improvement. More broadly, they motivate a systems-level view of agentic AI in which performance gains come less from ever-larger individual models and more from market-inspired coordination mechanisms that organize heterogeneous agents into efficient, adaptive ecosystems.