HarnessBridge: LLM 에이전트 하네스를 위한 학습 가능한 양방향 제어기

초록

대규모 언어 모델은 점차 장기적 과제를 수행하는 에이전트로 배치되고 있지만, 그 성능은 모델의 역량과 환경 설계뿐만 아니라 에이전트-환경 상호작용을 중재하는 하네스(harness)에 의해서도 결정된다. 기존 하네스는 대부분 수동으로 설계되어 궤적이 길어지고 상호작용이 더 복잡해짐에 따라 확장이 어렵다. 본 연구에서는 하네스를 종단 간 학습이 가능한 학습 가능한 플러그인 모듈로 생성할 수 있는지 묻고자 한다. 우리는 HarnessBridge를 소개한다. 이는 에이전트-환경 인터페이스를 양방향 투영으로 매개변수화하는 경량의 학습 가능한 하네스 제어기이다. HarnessBridge는 두 가지 양방향 투영을 학습한다: 관찰 투영은 원시 궤적을 의사 결정에 관련된 간결한 상태로 증류하고, 행동 투영은 제안된 행동을 실행 가능한 전이 또는 궤적 기반 거부로 변환한다. 우리는 통합 명령어 튜닝을 통해 하네스 감독 데이터셋에서 HarnessBridge를 훈련한다. Terminal-Bench~2.0 및 SWE-bench Verified에서 HarnessBridge는 강력한 특수 하네스와 일치하거나 능가하면서 토큰 사용량과 궤적 길이를 크게 줄이고, 더 작은 생성기에서 더 큰 상용 모델로 일반화된다.

English

Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and environment design, but also by the harness that mediates agent--environment interaction. Existing harnesses are largely manually engineered, making them difficult to scale as trajectories grow longer and interactions become more complex. In this work, we ask whether harness can be generated by a learnable plug-in module that can be trained in an end-to-end fashion. We introduce HarnessBridge, a lightweight learnable harness controller that parameterizes the agent--environment interface as a bidirectional projection. HarnessBridge learns two bidirectional projections: observation projection, which distills raw trajectories into compact, decision-relevant states, and action projection, which converts proposed actions into executable transitions or trajectory-grounded rejections. We train HarnessBridge on a harness supervision dataset via unified instruction tuning. On Terminal-Bench~2.0 and SWE-bench Verified, HarnessBridge matches or surpasses strong specialized harnesses while substantially reducing token usage and trajectory length, and generalizes from smaller generators to larger commercial models.