HarnessBridge: 面向LLM智能体控制的可学习双向控制器

摘要

大语言模型越来越多地被部署为执行长周期任务的智能体，然而其性能不仅受模型能力与环境设计的影响，还受到调节智能体与环境之间交互的接口（harness）的制约。现有接口主要依赖人工构造，随着轨迹持续增长、交互日趋复杂，其扩展性面临挑战。在本工作中，我们探究是否可以通过一个可学习的插件模块来生成接口，并以端到端方式进行训练。为此，我们提出HarnessBridge——一种轻量级可学习接口控制器，它将智能体-环境界面参数化为双向投影。HarnessBridge学习两种双向投影：观测投影将原始轨迹蒸馏为紧凑且与决策相关的状态，动作投影则将提议动作转化为可执行的转换或基于轨迹的拒绝。我们通过统一指令微调，在接口监督数据集上训练HarnessBridge。在Terminal-Bench~2.0和SWE-bench Verified基准上，HarnessBridge在匹配或超越强专用接口的同时，显著降低了令牌使用量和轨迹长度，并能从小规模生成器泛化至更大的商业模型。

English

Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and environment design, but also by the harness that mediates agent--environment interaction. Existing harnesses are largely manually engineered, making them difficult to scale as trajectories grow longer and interactions become more complex. In this work, we ask whether harness can be generated by a learnable plug-in module that can be trained in an end-to-end fashion. We introduce HarnessBridge, a lightweight learnable harness controller that parameterizes the agent--environment interface as a bidirectional projection. HarnessBridge learns two bidirectional projections: observation projection, which distills raw trajectories into compact, decision-relevant states, and action projection, which converts proposed actions into executable transitions or trajectory-grounded rejections. We train HarnessBridge on a harness supervision dataset via unified instruction tuning. On Terminal-Bench~2.0 and SWE-bench Verified, HarnessBridge matches or surpasses strong specialized harnesses while substantially reducing token usage and trajectory length, and generalizes from smaller generators to larger commercial models.