Combee：面向自改进语言模型智能体的可扩展提示学习框架

摘要

近期提示学习技术的进步使得大语言模型智能体能够在无需调整参数的情况下，从推理时上下文中获取任务相关知识。例如，现有方法（如ACE或GEPA）可通过学习系统提示来基于先前的智能体运行记录提升准确率。然而这些方法主要聚焦于单智能体或低并行度场景，这从根本上限制了其从海量智能体运行轨迹中高效学习的能力。随着从多智能体轨迹或并行智能体执行中学习的趋势日益显著，开展并行化提示学习将兼具高效性与实用性。但若缺乏系统化的扩展策略，现有方法在高并行度下会出现质量衰减问题。为同时提升提示学习的效率与质量，我们提出Combee框架——一种面向自进化智能体的并行提示学习扩展方案。该框架通过并行扫描技术与增强型混洗机制加速学习进程，在保持质量不衰减的前提下支持大量智能体并行运行并聚合学习其轨迹。Combee还引入了动态批次大小控制器以平衡质量与延迟。在AppWorld、Terminal-Bench、Formula和FiNER等基准上的实验表明，Combee在保持相当或更优准确率及同等成本的前提下，相较现有方法可实现最高17倍的加速效果。

English

Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter changes. For example, existing methods (like ACE or GEPA) can learn system prompts to improve accuracy based on previous agent runs. However, these methods primarily focus on single-agent or low-parallelism settings. This fundamentally limits their ability to efficiently learn from a large set of collected agentic traces. It would be efficient and beneficial to run prompt learning in parallel to accommodate the growing trend of learning from many agentic traces or parallel agent executions. Yet without a principled strategy for scaling, current methods suffer from quality degradation with high parallelism. To improve both the efficiency and quality of prompt learning, we propose Combee, a novel framework to scale parallel prompt learning for self-improving agents. Combee speeds up learning and enables running many agents in parallel while learning from their aggregate traces without quality degradation. To achieve this, Combee leverages parallel scans and employs an augmented shuffle mechanism; Combee also introduces a dynamic batch size controller to balance quality and delay. Evaluations on AppWorld, Terminal-Bench, Formula, and FiNER demonstrate that Combee achieves up to 17x speedup over previous methods with comparable or better accuracy and equivalent cost.