에이전트 하네스로서의 코드

초록

최근 대규모 언어 모델(LLM)은 경쟁 프로그래밍에서부터 저장소 수준의 소프트웨어 엔지니어링에 이르기까지 코드를 이해하고 생성하는 강력한 능력을 입증해 왔다. 새로운 에이전트 시스템에서 코드는 더 이상 단순한 최종 출력 대상이 아니다. 점차 에이전트의 추론, 행동, 환경 모델링, 실행 기반 검증을 위한 운영 기반(operational substrate)으로 기능하고 있다. 우리는 이러한 변화를 에이전트 하네스(agent harness)의 관점에서 조명하고, 코드를 에이전트 인프라의 기반으로 삼는 통합된 시각인 '코드 기반 에이전트 하네스(code as agent harness)'를 제안한다. 이러한 관점을 체계적으로 연구하기 위해, 본 서베이는 세 가지 상호 연결된 계층을 중심으로 구성된다. 첫째, 하네스 인터페이스를 연구한다. 여기서 코드는 에이전트를 추론, 행동, 환경 모델링에 연결한다. 둘째, 하네스 메커니즘을 살펴본다. 장기 실행을 위한 계획, 기억, 도구 사용, 그리고 하네스를 신뢰할 수 있고 적응적으로 만드는 피드백 기반 제어 및 최적화가 이에 해당한다. 셋째, 단일 에이전트 시스템에서 다중 에이전트 환경으로 하네스를 확장하는 방식을 논의한다. 이때 공유 코드 아티팩트가 다중 에이전트 간 조정, 검토, 검증을 지원한다. 이러한 계층에 걸쳐, 코드 기반 에이전트 하네스의 대표적인 방법과 실제 응용 사례를 요약한다. 코딩 어시스턴트, GUI/OS 자동화, 임베디드 에이전트, 과학적 발견, 개인화 및 추천, DevOps, 엔터프라이즈 워크플로우 등이 포함된다. 또한 최종 과업 성공을 넘어선 평가, 불완전한 피드백 하에서의 검증, 회귀 없는 하네스 개선, 여러 에이전트 간의 일관된 공유 상태, 안전-중요 작업에 대한 인간의 감독, 다중 모달 환경으로의 확장 등 하네스 엔지니어링의 공개된 과제들을 추가로 제시한다. 에이전트 AI의 하네스로서 코드를 중심에 둠으로써, 본 서베이는 실행 가능하고 검증 가능하며 상태 저장형 AI 에이전트 시스템을 향한 통일된 로드맵을 제공한다.

English

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that centers code as the basis for agent infrastructure. To systematically study this perspective, we organize the survey around three connected layers. First, we study the harness interface, where code connects agents to reasoning, action, and environment modeling. Second, we examine harness mechanisms: planning, memory, and tool use for long-horizon execution, together with feedback-driven control and optimization that make harness reliable and adaptive. Third, we discuss scaling the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification. Across these layers, we summarize representative methods and practical applications of code as agent harness, spanning coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. We further outline open challenges for harness engineering, including evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments. By centering code as the harness of agentic AI, this survey provides a unified roadmap toward executable, verifiable, and stateful AI agent systems.