GitHub上的AI编程助手研究
AIDev: Studying AI Coding Agents on GitHub
February 9, 2026
作者: Hao Li, Haoxiang Zhang, Ahmed E. Hassan
cs.AI
摘要
AI编程代理正通过执行功能开发、调试和测试等任务,迅速改变软件工程领域。尽管其影响力与日俱增,研究界仍缺乏全面记录这些代理在真实项目中应用情况的数据集。为填补这一空白,我们推出AIDev——一个专注于真实GitHub仓库中代理撰写拉取请求(Agent式PR)的大规模数据集。AIDev汇集了由OpenAI Codex、Devin、GitHub Copilot、Cursor和Claude Code这五大代理生成的932,791个Agent式PR,覆盖116,211个代码仓库,涉及72,189名开发者。此外,AIDev还包含从2,807个星标数超100的仓库中精选的33,596个Agent式PR子集,提供评论、审阅、提交记录及相关议题等深度信息。该数据集为研究新时代软件工程中AI应用、开发者效能以及人机协作奠定了重要基础。
> AI代理、代理式AI、编程代理、代理式编程、代理式软件工程、代理式工程
English
AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering.
> AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering