HarnessX: 構成可能で適応的かつ進化可能なエージェントハーネスファウンドリ

要旨

AIエージェントの性能は、モデルが観察・推論・行動を行う方法を仲介するプロンプト、ツール、メモリ、制御フローから構成されるランタイムハーネスに決定的に依存する。しかし、現在のハーネスは大部分が手作業で作成され、静的なままであり、新しいモデルやタスクのたびに個別の足場が要求され、実行中に生成される豊富なトレースが体系的な改善に還元されることはほとんどない。本稿では、合成可能で適応的かつ進化可能なエージェントハーネスのための基盤であるHarnessXを紹介する。HarnessXは、代入代数を介して型付きハーネスプリミティブを組み立て、記号的な適応と強化学習の間の動作ミラーリングに基づくトレース駆動型マルチエージェント進化エンジンであるAEGISを通じてそれらを適応させ、軌跡をハーネス更新とモデル学習信号の両方に変換することでハーネス-モデルループを閉じる。5つのベンチマーク（ALFWorld、GAIA、WebShop、tau^3-Bench、SWE-bench Verified）において、HarnessXは平均+14.5%（最大+44.0%）の改善を示し、ベースラインが最も低い箇所で改善が最大となった。これらの結果は、エージェントの進歩がモデルスケーリングのみに依存する必要はなく、実行フィードバックからランタイムインターフェースを合成・進化させることが実行可能かつ相補的な手段であることを示唆している。完全なコードベースは将来のリリースでオープンソース化される予定である。

English

AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses remain largely hand-crafted and static: each new model or task still demands bespoke scaffolding, and the rich traces produced during execution are rarely distilled back into systematic improvement. We introduce HarnessX, a foundry for composable, adaptive, and evolvable agent harnesses. HarnessX assembles typed harness primitives via a substitution algebra, adapts them through AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning, and closes the harness-model loop by turning trajectories into both harness updates and model training signal. Across five benchmarks (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified), HarnessX yields an average gain of +14.5% (up to +44.0%), with gains largest where baselines are lowest. These results suggest that agent progress need not come from model scaling alone: composing and evolving runtime interfaces from execution feedback is an actionable and complementary lever. The complete codebase will be open-sourced in a future release.