探究自主智能体在真实环境中的贡献：活动模式与代码变更的时间演进

摘要

大型代码生成语言模型的兴起正在重塑软件开发范式。能够自主创建分支、开启拉取请求和执行代码审查的智能编程代理，如今已活跃在真实项目的贡献中。其日益增长的影响力为研究AI驱动贡献及其对代码质量、团队协作与软件可维护性的影响提供了独特而适时的契机。本研究构建了一个包含约11万条开源拉取请求的新型数据集，涵盖关联提交、评论、审查、议题及文件变更，共同构成了数百万行源代码的完整图谱。我们比较了包括OpenAI Codex、Claude Code、GitHub Copilot、Google Jules和Devin在内的五款主流编程代理，从合并频率、修改文件类型以及开发者互动信号（如评论与审查）等多维度剖析其应用差异。值得注意的是，代码编写与审查仅是软件工程流程的冰山一角，生成代码的长期维护与迭代同样至关重要。为此，我们针对智能体生成代码与人工编写代码进行了纵向追踪，提出了存活率与变更率的若干量化评估。最终数据显示，尽管开源项目中智能体参与度持续攀升，但其贡献的代码随时间推移产生的变更量显著高于人工编写代码。

English

The rise of large language models for code has reshaped software development. Autonomous coding agents, able to create branches, open pull requests, and perform code reviews, now actively contribute to real-world projects. Their growing role offers a unique and timely opportunity to investigate AI-driven contributions and their effects on code quality, team dynamics, and software maintainability. In this work, we construct a novel dataset of approximately 110,000 open-source pull requests, including associated commits, comments, reviews, issues, and file changes, collectively representing millions of lines of source code. We compare five popular coding agents, including OpenAI Codex, Claude Code, GitHub Copilot, Google Jules, and Devin, examining how their usage differs in various development aspects such as merge frequency, edited file types, and developer interaction signals, including comments and reviews. Furthermore, we emphasize that code authoring and review are only a small part of the larger software engineering process, as the resulting code must also be maintained and updated over time. Hence, we offer several longitudinal estimates of survival and churn rates for agent-generated versus human-authored code. Ultimately, our findings indicate an increasing agent activity in open-source projects, although their contributions are associated with more churn over time compared to human-authored code.