HarnessForge：自适性代理系統中控御與策略的聯合演化

摘要

LLM agent 被日益期望能在需要不同執行典範的異質任務領域中運作。這對固定的 agent 系統構成挑戰，並促進了超越孤立元件更新的系統層級後設適應。雖然現有研究已調整外部 harness 或訓練底層推理策略，但全系統適應仍未被充分表徵。結構與執行之間的適應空間鮮少被明確化，且外部 harness 與內部推理器之間的相容性也未經聯合優化。我們提出 HarnessForge，一個用於演化 LLM agent 系統的後設適應框架。HarnessForge 將 agent 系統表述為一個 harness-策略配對，定義了一個穩定的適應空間，將 harness 層級的執行結構與策略層級的推理行為分離開來。接著透過故障導向的 harness 剪裁與 harness 條件化的策略對齊，執行 harness-策略共同演化。在跨五個不同領域的基準測試上的實驗顯示，HarnessForge 持續改善了 Qwen3-4B 和 Qwen3-8B 的基礎模型，超越了僅 harness 和僅策略的基線，相較最強基線提升了高達 12.0%，並取得了良好的展開效率權衡，證明了 harness-策略共同演化是有效的，且 harness 與推理策略之間的可執行相容性對於 agent 系統適應至關重要。程式碼可在 https://github.com/mingju-c/HarnessForge 取得。

English

LLM agents are increasingly expected to operate across heterogeneous task regimes that require distinct execution paradigms. This challenges fixed agent systems and motivates system-level meta-adaptation beyond isolated component updates. While existing works have adapted external harness or trained underlying reasoning policies, full-system adaptation remains insufficiently characterized. The adaptation space between structure and execution is rarely made explicit, and the compatibility between the external harness and the internal reasoner is not optimized jointly. We propose HarnessForge, a meta-adaptive framework for evolving LLM agent systems. HarnessForge formulates an agent system as a harness--policy pair, defining a stable adaptation space that separates harness-level execution structure from policy-level reasoning behavior. It then performs harness--policy co-evolution through fault-guided harness tailoring and harness-conditioned policy alignment. Experiments across five benchmarks from diverse domains show that HarnessForge consistently improves both Qwen3-4B and Qwen3-8B backbones, outperforming harness-only and policy-only baselines with gains of up to 12.0\% over the strongest baseline and achieving favorable rollout-efficiency tradeoffs, demonstrating that harness--policy co-evolution is effective, and that executable compatibility between the harness and reasoning policy is essential for agent-system adaptation. The code is available at https://github.com/mingju-c/HarnessForge.