適応型自動ハーネス：オープンエンドなタスクストリーム上でのエージェントシステム展開のための持続的自己改善

要旨

自動ハーネスシステム（A-Evolve、GEPA、Meta-Harnessなど）は、実行フィードバックからプロンプト、スキル、ツール、メモリ、およびそれを支える基盤を最適化することでLLMエージェントを改善するが、これらは通常、固定されたオフラインベンチマークで評価される。実際の展開では、代わりにオープンエンドなタスクストリームが発生する。すなわち、履歴は終点なく拡大し、異種タスクには異なるハーネスが必要であり、問題分布は時間とともに変化する。これらの課題により、単一のハーネスを繰り返し密に更新すると堅牢性が損なわれ、精度が早期にピークに達した後に低下するという性能劣化が生じる。このことは、タスク適応を伴う持続的なハーネス構築の動機となる。本稿では、このようなストリームに対応するフレームワークおよびシステムとしてAdaptive Auto-Harnessを提案する。本フレームワークは、オラクルハーネスへのギャップを進化損失と適応損失とに分解する。本システムは、これらの損失に対処するために、ステートフルなマルチエージェント進化器、解決時ルーティングを備えたハーネスツリー、および履歴に必要な信号が欠けている場合の人間による誘導フックを採用する。予測市場、セキュリティコンペティション、イベント予測ストリームにおいて、Adaptive Auto-Harnessは既存の5つの自動ハーネスベースラインを上回り、アブレーション研究により、その利点はより良い構築、ルーティング、またはターゲットを絞った人間の誘導に起因することが示された。コードはhttps://github.com/A-EVO-Lab/AdaptiveHarnessで入手可能である。

English

Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These challenges make a single repeatedly and densely updated harness brittle, causing performance degradation as accuracy peaks early and then declines. This motivates sustained harness construction with task-wise adaptation. We introduce Adaptive Auto-Harness, a framework and system for such streams. The framework decomposes the gap to an oracle harness into evolution loss and adaptation loss. The system addresses these losses with a stateful multi-agent evolver, a harness tree with solve-time routing, and human-steering hooks for cases where history lacks the needed signal. Across prediction-market, security-competition, and event-forecasting streams, Adaptive Auto-Harness outperforms five existing auto-harness baselines and ablations attribute gains to better construction, routing, or targeted human steering. Code is available in https://github.com/A-EVO-Lab/AdaptiveHarness .