智能体自述文件：基于情境文件的智能编码实证研究

摘要

智能体编程工具以自然语言描述的目标作为输入，将其分解为具体任务，并以最少的人工干预编写或执行实际代码。该过程的核心是智能体上下文文件（即"面向智能体的README文件"），这些文件提供持久性的项目级指令。本文通过对来自1,925个代码库的2,303个智能体上下文文件进行首次大规模实证研究，系统分析了其结构、维护模式和内容特征。研究发现这些文件并非静态文档，而是类似配置代码般持续演化的复杂产物，通过频繁的小幅增补进行维护且可读性较差。对16类指令的内容分析表明，开发者优先关注功能上下文：构建运行命令（62.3%）、实现细节（69.9%）和系统架构（67.7%）。同时发现显著缺陷：非功能性需求如安全性（14.5%）和性能（14.5%）鲜少被明确规范。这些发现表明，开发者虽利用上下文文件实现智能体功能，却未设立足够防护措施来确保智能体编写代码的安全性与性能，凸显出改进工具链与实践范式的迫切需求。

English

Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

智能体自述文件：基于情境文件的智能编码实证研究

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

摘要

Support