HarnessX: 一种可组合、自适应且可演化的智能体框架铸造平台

摘要

AI代理的性能关键取决于运行时支架，包括提示、工具、记忆和控制流，这些组件中介了模型如何观察、推理和行动。然而，当前的支架仍大多依赖手工构建且静态固化：每个新模型或新任务仍需定制化的支撑框架，执行过程中产生的丰富轨迹也鲜少被提炼为系统性改进。我们提出HarnessX——一个可组合、自适应且可进化的代理支架铸造平台。HarnessX通过替换代数组装类型化支架原语，借助AEGIS（一种基于轨迹驱动的多代理进化引擎，在符号适应与强化学习之间建立操作镜像）实现自适应调整，并通过将轨迹转化为支架更新与模型训练信号，闭合支架-模型循环。在五项基准测试（ALFWorld、GAIA、WebShop、tau^3-Bench及SWE-bench Verified）中，HarnessX平均提升+14.5%（最高达+44.0%），且基线越低进步越显著。这些结果表明，代理性能的提升不必仅依赖模型规模扩展：基于执行反馈组合并进化运行时接口，是一个可操作且互补的杠杆。完整代码库将在未来版本中开源。

English

AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses remain largely hand-crafted and static: each new model or task still demands bespoke scaffolding, and the rich traces produced during execution are rarely distilled back into systematic improvement. We introduce HarnessX, a foundry for composable, adaptive, and evolvable agent harnesses. HarnessX assembles typed harness primitives via a substitution algebra, adapts them through AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning, and closes the harness-model loop by turning trajectories into both harness updates and model training signal. Across five benchmarks (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified), HarnessX yields an average gain of +14.5% (up to +44.0%), with gains largest where baselines are lowest. These results suggest that agent progress need not come from model scaling alone: composing and evolving runtime interfaces from execution feedback is an actionable and complementary lever. The complete codebase will be open-sourced in a future release.