程式碼作為代理框架

摘要

近期大型語言模型（LLMs）在程式碼理解與生成方面展現出強大能力，涵蓋範疇從競賽程式設計到儲存庫層級的軟體工程。在新興的自主代理系統中，程式碼不再僅是最終產出目標，而是逐漸成為代理進行推理、行動、環境建模以及基於執行的驗證時的操作基礎。我們透過代理架構的視角來詮釋此轉變，並提出「程式碼作為代理架構」此一統一觀點，將程式碼視為代理基礎設施的核心。為系統性地探討此觀點，我們以三個相互關聯的層面來組織本調查報告。首先，研究架構介面，其中程式碼將代理與推理、行動及環境建模連結起來。其次，探討架構機制：規劃、記憶與工具使用以實現長期任務執行，同時引入反饋驅動的控制與最佳化，使架構可靠且具適應性。第三，討論架構從單一代理系統擴展至多代理場景，此時共享的程式碼產物可支援多代理協調、審查與驗證。在這些層面中，我們總結了「程式碼作為代理架構」的代表性方法與實際應用，涵蓋程式設計助手、GUI/作業系統自動化、具身代理、科學發現、個人化與推薦、DevOps以及企業工作流程。我們進一步概述了架構工程中的開放挑戰，包括超越最終任務成功的評估、在不完整反饋下的驗證、無回歸的架構改進、跨多個代理的一致共享狀態、針對安全關鍵行動的人類監督，以及對多模態環境的延伸。透過將程式碼定位為自主AI的架構，本調查提供了一條邁向可執行、可驗證且具狀態的AI代理系統的統一路線圖。

English

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that centers code as the basis for agent infrastructure. To systematically study this perspective, we organize the survey around three connected layers. First, we study the harness interface, where code connects agents to reasoning, action, and environment modeling. Second, we examine harness mechanisms: planning, memory, and tool use for long-horizon execution, together with feedback-driven control and optimization that make harness reliable and adaptive. Third, we discuss scaling the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification. Across these layers, we summarize representative methods and practical applications of code as agent harness, spanning coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. We further outline open challenges for harness engineering, including evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments. By centering code as the harness of agentic AI, this survey provides a unified roadmap toward executable, verifiable, and stateful AI agent systems.