エージェントハーネスとしてのコード

要旨

近年の大規模言語モデル（LLM）は、競技プログラミングからリポジトリレベルのソフトウェア工学に至るまで、コードの理解と生成において強力な能力を示している。新興のエージェントシステムでは、コードはもはや単なる出力対象ではない。エージェントの推論、行動、環境モデリング、実行ベースの検証のための動作基盤として、ますます重要な役割を果たしつつある。本稿では、この変化をエージェントハーネスの観点から捉え、「コードをエージェントハーネスとして」、すなわちコードをエージェントインフラの基盤として位置づける統一的な視点を導入する。この視点を体系的に研究するために、本サーベイは三つの相互接続された層で構成する。第一に、ハーネスインターフェースを研究する。ここではコードがエージェントを推論、行動、環境モデリングに接続する。第二に、ハーネスメカニズムを検討する。長期実行のための計画、記憶、ツール使用、ならびにハーネスを信頼性・適応性のあるものにするフィードバック駆動の制御と最適化である。第三に、ハーネスを単一エージェントシステムからマルチエージェント設定へと拡張する議論を行う。そこでは共有コード成果物がマルチエージェントの協調、レビュー、検証を支援する。これらの層にわたり、コードをエージェントハーネスとして用いる代表的な手法と実用的応用を、コーディングアシスタント、GUI/OS自動化、身体化エージェント、科学的発見、パーソナライゼーションとレコメンデーション、DevOps、エンタープライズワークフローにわたってまとめる。さらに、最終的なタスク成功を超えた評価、不完全なフィードバック下での検証、回帰のないハーネス改善、複数エージェント間での一貫した共有状態、安全上重要なアクションに対する人間による監視、マルチモーダル環境への拡張など、ハーネス工学における未解決の課題を概説する。コードをエージェントAIのハーネスとして中心に据えることにより、本サーベイは実行可能、検証可能、かつ状態保持可能なAIエージェントシステムへの統一的なロードマップを提供する。

English

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that centers code as the basis for agent infrastructure. To systematically study this perspective, we organize the survey around three connected layers. First, we study the harness interface, where code connects agents to reasoning, action, and environment modeling. Second, we examine harness mechanisms: planning, memory, and tool use for long-horizon execution, together with feedback-driven control and optimization that make harness reliable and adaptive. Third, we discuss scaling the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification. Across these layers, we summarize representative methods and practical applications of code as agent harness, spanning coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. We further outline open challenges for harness engineering, including evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments. By centering code as the harness of agentic AI, this survey provides a unified roadmap toward executable, verifiable, and stateful AI agent systems.