MOCHA:基於多目標切比雪夫退火的智能體技能優化
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
May 19, 2026
作者: Md Mehrab Tanjim, Jayakumar Subramanian, Xiang Chen, Branislav Kveton, Subhojyoti Mukherjee, Anlan Zhang, Sungchul Kim, Somdeb Sarkhel, Sunav Choudhury
cs.AI
摘要
LLM智能體透過技能組織行為——這些技能是結構化的自然語言規範,規範了智能體如何推理、檢索及回應。與單體提示詞不同,技能屬於多欄位構件,受到嚴格的平台限制:描述欄位會因路由需求而被截斷,指令主體透過漸進式揭露進行壓縮,而共存的技能則需競爭有限的上下文視窗。這些限制使得技能最佳化本質上成為多目標問題:一項技能必須同時最大化任務表現並滿足平台限制。然而,現有的提示詞最佳化工具若非忽略這些權衡取捨,就是將其簡化為加權總和,因而遺漏了非凸目標區域中的帕雷托最優變體。我們提出了MOCHA(多目標切比雪夫退火),它以切比雪夫標量化取代單目標選取——能涵蓋完整的帕雷托前緣(包括非凸區域)——並結合從探索轉向利用的指數退火。在我們針對六種不同智能體技能的實驗中(所有方法共用相同的多目標突變運算元,且基線皆獲得相同的每目標文字回饋),現有最佳化工具在六項任務中有四項無法改良種子技能:歷經一千次推演仍毫無進展。MOCHA則在所有任務上取得突破,平均正確率相較最強基線提升了7.5%(在FEVER上高達14.9%,在TheoremQA上達10.4%),同時發現了兩倍以上的帕雷托最優技能變體。
English
LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.