ChatPaper.aiChatPaper

MOCHA: 多目标切比雪夫退火用于智能体技能优化

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

May 19, 2026
作者: Md Mehrab Tanjim, Jayakumar Subramanian, Xiang Chen, Branislav Kveton, Subhojyoti Mukherjee, Anlan Zhang, Sungchul Kim, Somdeb Sarkhel, Sunav Choudhury
cs.AI

摘要

LLM代理通过技能组织行为——这些技能是结构化的自然语言规范,定义了代理如何推理、检索和响应。与单一提示不同,技能是多字段产物,受到严格的平台约束:描述字段被截断用于路由,指令主体通过渐进式展开进行压缩,共存技能在有限的上下文窗口中相互竞争。这些约束使得技能优化本质上是多目标的:一个技能必须同时最大化任务性能并满足平台限制。然而,现有的提示优化器要么忽视这些权衡,要么将其简化为加权求和,从而在非凸目标区域中遗漏帕累托最优变体。我们提出了MOCHA(多目标切比雪夫退火),该方法用切比雪夫标量化替代单目标选择——覆盖完整的帕累托前沿,包括非凸区域——并结合指数退火,从探索过渡到利用。在我们的实验中,涉及六个不同的代理技能——所有方法共享相同的多目标变异算子,基线获得相同的每目标文本反馈——现有优化器在6个任务中有4个未能改进种子技能:1000次回滚未取得任何进展。MOCHA在每个任务上都取得了突破,在平均正确率上比最强基线提高了7.5%(在FEVER上高达14.9%,在TheoremQA上高达10.4%),同时发现的帕累托最优技能变体数量是基线的两倍。
English
LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.