HarnessForge: ハーネスとポリシーの共同進化による適応エージェントシステム

要旨

LLMエージェントは、異なる実行パラダイムを必要とする異種タスク領域にわたって動作することがますます期待されている。このことは、固定されたエージェントシステムに課題を突きつけ、個別のコンポーネント更新を超えたシステムレベルのメタ適応を動機づける。既存研究では外部ハーネスを適応させたり、基盤となる推論ポリシーを訓練したりしているが、システム全体の適応はまだ十分に特徴づけられていない。構造と実行の間の適応空間が明示されることはほとんどなく、外部ハーネスと内部リゾルバ（推論器）の間の互換性も共同最適化されていない。本稿では、LLMエージェントシステムを進化させるメタ適応フレームワークHarnessForgeを提案する。HarnessForgeはエージェントシステムをハーネス・ポリシーペアとして定式化し、ハーネスレベルの実行構造とポリシーレベルの推論動作を分離する安定した適応空間を定義する。次に、フォールト誘導型ハーネス調整とハーネス条件付きポリシーアライメントを通じて、ハーネス・ポリシーの共進化を実行する。多様な分野の5つのベンチマークを用いた実験では、HarnessForgeがQwen3-4BおよびQwen3-8Bの両バックボーンで一貫して改善を示し、ハーネスのみまたはポリシーのみのベースラインを上回り、最強ベースラインに対して最大12.0%の向上を達成し、好ましいロールアウト効率のトレードオフを示した。これにより、ハーネス・ポリシーの共進化が有効であり、ハーネスと推論ポリシーの間の実行可能な互換性がエージェントシステムの適応に不可欠であることが実証された。コードはhttps://github.com/mingju-c/HarnessForgeで公開されている。

English

LLM agents are increasingly expected to operate across heterogeneous task regimes that require distinct execution paradigms. This challenges fixed agent systems and motivates system-level meta-adaptation beyond isolated component updates. While existing works have adapted external harness or trained underlying reasoning policies, full-system adaptation remains insufficiently characterized. The adaptation space between structure and execution is rarely made explicit, and the compatibility between the external harness and the internal reasoner is not optimized jointly. We propose HarnessForge, a meta-adaptive framework for evolving LLM agent systems. HarnessForge formulates an agent system as a harness--policy pair, defining a stable adaptation space that separates harness-level execution structure from policy-level reasoning behavior. It then performs harness--policy co-evolution through fault-guided harness tailoring and harness-conditioned policy alignment. Experiments across five benchmarks from diverse domains show that HarnessForge consistently improves both Qwen3-4B and Qwen3-8B backbones, outperforming harness-only and policy-only baselines with gains of up to 12.0\% over the strongest baseline and achieving favorable rollout-efficiency tradeoffs, demonstrating that harness--policy co-evolution is effective, and that executable compatibility between the harness and reasoning policy is essential for agent-system adaptation. The code is available at https://github.com/mingju-c/HarnessForge.