MOCHA: 多目标切比雪夫退火用于智能体技能优化

摘要

LLM代理通过技能组织行为——这些技能是结构化的自然语言规范，定义了代理如何推理、检索和响应。与单一提示不同，技能是多字段产物，受到严格的平台约束：描述字段被截断用于路由，指令主体通过渐进式展开进行压缩，共存技能在有限的上下文窗口中相互竞争。这些约束使得技能优化本质上是多目标的：一个技能必须同时最大化任务性能并满足平台限制。然而，现有的提示优化器要么忽视这些权衡，要么将其简化为加权求和，从而在非凸目标区域中遗漏帕累托最优变体。我们提出了MOCHA（多目标切比雪夫退火），该方法用切比雪夫标量化替代单目标选择——覆盖完整的帕累托前沿，包括非凸区域——并结合指数退火，从探索过渡到利用。在我们的实验中，涉及六个不同的代理技能——所有方法共享相同的多目标变异算子，基线获得相同的每目标文本反馈——现有优化器在6个任务中有4个未能改进种子技能：1000次回滚未取得任何进展。MOCHA在每个任务上都取得了突破，在平均正确率上比最强基线提高了7.5%（在FEVER上高达14.9%，在TheoremQA上高达10.4%），同时发现的帕累托最优技能变体数量是基线的两倍。

English

LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.