Combee：擴展提示學習以實現自我改進的語言模型代理

摘要

近期提示學習的進展使大型語言模型代理能夠在無需調整參數的情況下，從推理階段的上下文獲取任務相關知識。例如現有方法（如ACE或GEPA）可通過分析過往代理運行記錄來學習系統提示詞以提升準確率。然而這些方法主要聚焦於單代理或低並行度場景，這從根本上限制了其從大量代理軌跡中高效學習的能力。隨著從多代理軌跡或並行代理執行中學習的趨勢日益增長，採用並行化提示學習將兼具效率與效益。但若缺乏系統化的擴展策略，現有方法在高並行度下會出現質量衰退問題。為同時提升提示學習的效率與質量，我們提出Combee——一個為自進化代理設計的可擴展並行提示學習新框架。Combee通過並行掃描技術與增強型混洗機制加速學習過程，在保持質量的前提下實現多代理並行運行並聚合其軌跡學習。此外，Combee還引入動態批次大小控制器來平衡質量與延遲。在AppWorld、Terminal-Bench、Formula和FiNER上的評估表明，Combee在保持相當或更優準確率且成本持平的情況下，相較既有方法可實現最高17倍的加速效果。

English

Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter changes. For example, existing methods (like ACE or GEPA) can learn system prompts to improve accuracy based on previous agent runs. However, these methods primarily focus on single-agent or low-parallelism settings. This fundamentally limits their ability to efficiently learn from a large set of collected agentic traces. It would be efficient and beneficial to run prompt learning in parallel to accommodate the growing trend of learning from many agentic traces or parallel agent executions. Yet without a principled strategy for scaling, current methods suffer from quality degradation with high parallelism. To improve both the efficiency and quality of prompt learning, we propose Combee, a novel framework to scale parallel prompt learning for self-improving agents. Combee speeds up learning and enables running many agents in parallel while learning from their aggregate traces without quality degradation. To achieve this, Combee leverages parallel scans and employs an augmented shuffle mechanism; Combee also introduces a dynamic batch size controller to balance quality and delay. Evaluations on AppWorld, Terminal-Bench, Formula, and FiNER demonstrate that Combee achieves up to 17x speedup over previous methods with comparable or better accuracy and equivalent cost.

Combee：擴展提示學習以實現自我改進的語言模型代理

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

摘要

Support