终极马具:一劳永逸的制作指南
The Last Harness You'll Ever Build
April 22, 2026
作者: Haebin Seong, Li Yin, Haoran Zhang
cs.AI
摘要
人工智能代理正日益广泛地部署于复杂的领域特定工作流——它们需要操作需要数十次点击和表单填写的企业级网络应用,协调横跨检索、提取与整合的多步骤研究流程,自动化跨陌生代码库的代码审查,以及处理需要精深领域知识的客户升级事务。每个新任务领域都需要耗费大量心血的专家级约束框架工程:即设计提示词、工具、协调逻辑和评估标准,使基础模型能有效运作。我们提出一个双层级框架来自动化这一过程。在第一层级,约束框架进化循环针对单一任务优化工作代理的约束框架H:工作代理W_H执行任务,评估代理V以对抗性方式诊断失败并评分性能,进化代理E则基于完整历史尝试记录修改约束框架。在第二层级,元进化循环跨多样化任务优化进化协议Λ=(W_H, H^(0), V, E)本身,通过学习获得最优协议Λ^(最佳),使任何新任务都能实现约束框架的快速收敛——从而让智能体适应全新领域时完全无需人工进行约束框架工程。我们形式化了其与元学习的对应关系,并给出双重算法。该框架将手动的约束框架工程转变为自动化的约束框架工程,并更进一步——实现了自动化设计机制本身的自我进化。
English
AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. Each new task domain requires painstaking, expert-driven harness engineering: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the Harness Evolution Loop optimizes a worker agent's harness H for a single task: a Worker Agent W_{H} executes the task, an Evaluator Agent V adversarially diagnoses failures and scores performance, and an Evolution Agent E modifies the harness based on the full history of prior attempts. At the second level, the Meta-Evolution Loop optimizes the evolution protocol Λ= (W_{H}, H^{(0)}, V, E) itself across diverse tasks, learning a protocol Λ^{(text{best)} that enables rapid harness convergence on any new task -- so that adapting an agent to a novel domain requires no human harness engineering at all.} We formalize the correspondence to meta-learning and present both algorithms. The framework shifts manual harness engineering into automated harness engineering, and takes one step further -- automating the design of the automation itself.