ChatPaper.aiChatPaper

HarnessX:一個可組合、自適應及可演化的代理框架工廠

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

June 12, 2026
作者: Tingyang Chen, Shuo Lu, Kang Zhao, Weicheng Meng, Hanlin Teng, Tianhao Li, Chao Li, Xule Liu, Jian Liang, Zhizhong Zhang, Yuan Xie, Heng Qu, Kun Shao, Jian Luan
cs.AI

摘要

AI智能體的效能關鍵取決於運行時框架,該框架包含提示、工具、記憶以及控制流程,這些元素調節著模型如何觀察、推理和行動。然而,當今的框架大多仍依賴手工設計且靜態:每個新模型或新任務仍需定制化的支撐結構,而執行過程中產生的豐富軌跡鮮少被提煉為系統性的改進。我們提出 HarnessX,一個用於可組合、自適應和可演化的智能體框架的打造平台。HarnessX 通過替換代數組裝類型化框架原語,通過 AEGIS(一個基於軌跡的多智能體演化引擎,其基礎是符號適應與強化學習之間的運作映射)進行自適應,並通過將軌跡轉化為框架更新和模型訓練信號來閉合框架-模型循環。在五個基準測試(ALFWorld、GAIA、WebShop、tau^3-Bench 和 SWE-bench Verified)中,HarnessX 平均提升了 +14.5%(最高 +44.0%),在基線最低的測試中提升最大。這些結果表明,智能體的進步不必僅來自模型規模擴展:從執行反饋中組合和演化運行時接口是一個可操作且互補的槓桿。完整的程式碼庫將在未來版本中開源。
English
AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses remain largely hand-crafted and static: each new model or task still demands bespoke scaffolding, and the rich traces produced during execution are rarely distilled back into systematic improvement. We introduce HarnessX, a foundry for composable, adaptive, and evolvable agent harnesses. HarnessX assembles typed harness primitives via a substitution algebra, adapts them through AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning, and closes the harness-model loop by turning trajectories into both harness updates and model training signal. Across five benchmarks (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified), HarnessX yields an average gain of +14.5% (up to +44.0%), with gains largest where baselines are lowest. These results suggest that agent progress need not come from model scaling alone: composing and evolving runtime interfaces from execution feedback is an actionable and complementary lever. The complete codebase will be open-sourced in a future release.