自适应自动框架：面向开放式任务流的智能体系统部署的持续自我改进

摘要

诸如A-Evolve、GEPA和Meta-Harness等自动工具集系统，通过从执行反馈中优化提示、技能、工具、记忆及支撑基础设施来提升大语言模型智能体性能，但这类系统通常仅在固定离线基准上进行评估。然而实际部署面临开放式的任务流：历史记录无限增长，异构任务需要不同工具集，问题分布随时间动态变化。这些挑战导致单一频繁密集更新的工具集变得脆弱，表现为准确率在早期达到峰值后持续下降。这促使我们需要构建具备任务级自适应能力的持久化工具集。本文提出自适应自动工具集（Adaptive Auto-Harness），这是一个面向此类任务流的框架与系统。该框架将理想工具集与当前工具集之间的差距分解为演化损失与适应损失。系统通过有状态多智能体进化器、带求解时路由的工具集树，以及在历史数据缺乏必要信号时嵌入的人工引导钩子来应对这些损失。在预测市场、安全竞赛和事件预测三类任务流中，自适应自动工具集优于五种现有自动工具集基线，消融实验表明性能提升归因于更优的构建、路由或针对性人工引导。代码已开源至https://github.com/A-EVO-Lab/AdaptiveHarness。

English

Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These challenges make a single repeatedly and densely updated harness brittle, causing performance degradation as accuracy peaks early and then declines. This motivates sustained harness construction with task-wise adaptation. We introduce Adaptive Auto-Harness, a framework and system for such streams. The framework decomposes the gap to an oracle harness into evolution loss and adaptation loss. The system addresses these losses with a stateful multi-agent evolver, a harness tree with solve-time routing, and human-steering hooks for cases where history lacks the needed signal. Across prediction-market, security-competition, and event-forecasting streams, Adaptive Auto-Harness outperforms five existing auto-harness baselines and ablations attribute gains to better construction, routing, or targeted human steering. Code is available in https://github.com/A-EVO-Lab/AdaptiveHarness .