智能体重构：基于人工智能编程代理的实证研究

摘要

诸如OpenAI Codex、Claude Code和Cursor等智能体编程工具正在重塑软件工程领域。这些AI驱动的系统如同自主协作的团队成员，能够规划并执行复杂的开发任务。在重构这一旨在提升代码内部质量而不改变可观测行为的可持续软件开发核心实践中，智能体已成为积极参与者。尽管应用日益广泛，但业界对智能体重构的实际运用方式、与人工重构的差异及其对代码质量的影响仍缺乏关键性实证认知。为填补这一空白，我们针对真实开源Java项目中的AI智能体重构开展大规模研究，基于AIDev数据集分析了12,256个拉取请求和14,988次提交中的15,451个重构实例。实证分析表明，重构在此开发模式中已成为常见且具目的性的活动，智能体在26.1%的提交中明确以重构为目标。重构类型分析显示，智能体的重构行为以低层次、一致性为导向的修改为主，例如变更变量类型（11.8%）、重命名参数（10.4%）和重命名变量（8.5%），这反映出其更倾向于局部优化而非人类重构中常见的高层设计变更。此外，智能体重构的动机高度集中于内部质量考量，可维护性（52.5%）和可读性（28.1%）占据主导。代码质量指标的定量评估进一步表明，智能体重构能在结构指标上产生虽小但统计显著的改善，尤其对中等规模变更效果明显，有效降低了类规模与复杂度（如类代码行数中位数变化量Δ=-15.25）。

English

Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape. These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks. Agents have become active participants in refactoring, a cornerstone of sustainable software development aimed at improving internal code quality without altering observable behavior. Despite their increasing adoption, there is a critical lack of empirical understanding regarding how agentic refactoring is utilized in practice, how it compares to human-driven refactoring, and what impact it has on code quality. To address this empirical gap, we present a large-scale study of AI agent-generated refactorings in real-world open-source Java projects, analyzing 15,451 refactoring instances across 12,256 pull requests and 14,988 commits derived from the AIDev dataset. Our empirical analysis shows that refactoring is a common and intentional activity in this development paradigm, with agents explicitly targeting refactoring in 26.1% of commits. Analysis of refactoring types reveals that agentic efforts are dominated by low-level, consistency-oriented edits, such as Change Variable Type (11.8%), Rename Parameter (10.4%), and Rename Variable (8.5%), reflecting a preference for localized improvements over the high-level design changes common in human refactoring. Additionally, the motivations behind agentic refactoring focus overwhelmingly on internal quality concerns, with maintainability (52.5%) and readability (28.1%). Furthermore, quantitative evaluation of code quality metrics shows that agentic refactoring yields small but statistically significant improvements in structural metrics, particularly for medium-level changes, reducing class size and complexity (e.g., Class LOC median Δ = -15.25).

智能体重构：基于人工智能编程代理的实证研究

Agentic Refactoring: An Empirical Study of AI Coding Agents

摘要

Support