自適應自動框架：在開放式任務串流中部署代理系統的可持續自我改進

摘要

自動框架系統（如 A-Evolve、GEPA 和 Meta-Harness）透過根據執行回饋最佳化提示、技能、工具、記憶及支援基礎設施，來提升大型語言模型代理的性能。然而，這些系統通常僅以固定的離線基準進行評估。在實際部署中，任務序列呈現開放式特徵：歷史紀錄會持續增長而無固定終點、異質性任務需要不同的框架、且問題分佈會隨時間演化。這些挑戰導致單一、反覆且密集更新的框架變得脆弱，表現退化——準確率在達到高峰後隨即下滑。這促使我們需要針對任務進行持續性的框架建構與自適應調整。本文提出「自適應自動框架」（Adaptive Auto-Harness），一個專為此類任務序列設計的框架與系統。該框架將與理想框架之間的差距分解為演化損失與適應損失。系統則透過具狀態的多智能體演化器、搭配求解時路由的框架樹，以及在歷史缺乏所需訊號時提供的人機引導鉤子來應對這些損失。在預測市場、安全競賽與事件預測等任務序列中，自適應自動框架的表現優於五種現有的自動框架基準，而消融實驗則將效能增益歸因於更好的建構、路由或針對性的人機引導。相關程式碼已公開於 https://github.com/A-EVO-Lab/AdaptiveHarness。

English

Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These challenges make a single repeatedly and densely updated harness brittle, causing performance degradation as accuracy peaks early and then declines. This motivates sustained harness construction with task-wise adaptation. We introduce Adaptive Auto-Harness, a framework and system for such streams. The framework decomposes the gap to an oracle harness into evolution loss and adaptation loss. The system addresses these losses with a stateful multi-agent evolver, a harness tree with solve-time routing, and human-steering hooks for cases where history lacks the needed signal. Across prediction-market, security-competition, and event-forecasting streams, Adaptive Auto-Harness outperforms five existing auto-harness baselines and ablations attribute gains to better construction, routing, or targeted human steering. Code is available in https://github.com/A-EVO-Lab/AdaptiveHarness .