MOCHA：基於多目標切比雪夫退火的智能體技能優化

摘要

LLM智能體透過技能組織行為——這些技能是結構化的自然語言規範，規範了智能體如何推理、檢索及回應。與單體提示詞不同，技能屬於多欄位構件，受到嚴格的平台限制：描述欄位會因路由需求而被截斷，指令主體透過漸進式揭露進行壓縮，而共存的技能則需競爭有限的上下文視窗。這些限制使得技能最佳化本質上成為多目標問題：一項技能必須同時最大化任務表現並滿足平台限制。然而，現有的提示詞最佳化工具若非忽略這些權衡取捨，就是將其簡化為加權總和，因而遺漏了非凸目標區域中的帕雷托最優變體。我們提出了MOCHA（多目標切比雪夫退火），它以切比雪夫標量化取代單目標選取——能涵蓋完整的帕雷托前緣（包括非凸區域）——並結合從探索轉向利用的指數退火。在我們針對六種不同智能體技能的實驗中（所有方法共用相同的多目標突變運算元，且基線皆獲得相同的每目標文字回饋），現有最佳化工具在六項任務中有四項無法改良種子技能：歷經一千次推演仍毫無進展。MOCHA則在所有任務上取得突破，平均正確率相較最強基線提升了7.5%（在FEVER上高達14.9%，在TheoremQA上達10.4%），同時發現了兩倍以上的帕雷托最優技能變體。

English

LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.