**能动性重构:AI编程代理的实证研究** **摘要** 近年来,人工智能在代码生成与程序修复领域取得了显著进展。然而,大多数现有工具仍局限于被动响应指令的模式,缺乏主动规划与自主迭代的“能动性”。本文提出“能动性重构”概念,旨在探索具备自主目标分解、多轮决策与自我优化能力的AI编程代理在软件重构任务中的表现。通过设计一套基于强化学习的代理框架,并针对开源项目进行大规模实证研究,本文评估了此类代理在代码质量提升、架构优化及技术债务削减方面的有效性。实验结果表明,与静态代码修复工具相比,能动性代理在复杂重构任务中展现出更高的成功率和效率,但其决策透明度与边界控制仍需进一步优化。本研究为下一代智能编程辅助系统的设计提供了理论与实践依据。 **关键词**:AI编程代理,软件重构,能动性系统,实证研究,代码优化
Agentic Refactoring: An Empirical Study of AI Coding Agents
November 6, 2025
作者: Kosei Horikawa, Hao Li, Yutaro Kashiwa, Bram Adams, Hajimu Iida, Ahmed E. Hassan
cs.AI
摘要
以OpenAI Codex、Claude Code和Cursor为代表的智能编码工具正在重塑软件工程领域。这些AI驱动的系统能够作为自主协作成员,规划并执行复杂的开发任务。在重构这一旨在提升代码内在质量而不改变外部行为的可持续软件开发核心实践中,智能体已成为积极参与者。尽管应用日益广泛,但业界对智能重构的实际运用方式、与人工重构的差异及其对代码质量的影响仍缺乏实证认知。为填补这一空白,我们通过对AIDev数据集衍生的12,256个拉取请求和14,988次提交中的15,451个重构实例展开大规模实证研究,深入分析了真实开源Java项目中AI智能体生成的重构行为。实证研究表明,在该开发范式下重构已成为常见且具目的性的活动,智能体在26.1%的提交中明确以重构为目标。重构类型分析显示,智能体的重构行为主要集中在低层次、偏向一致性的修改,如变更变量类型(11.8%)、重命名参数(10.4%)和重命名变量(8.5%),这反映出其更倾向于局部优化而非人类重构常见的高层设计变更。此外,智能体重构的动机高度集中于内在质量考量,可维护性(52.5%)和可读性(28.1%)成为主要驱动力。进一步对代码质量指标的量化评估表明,智能体重构虽改进幅度有限但具有统计显著性,尤其在中等级别变更中能有效缩减类规模与复杂度(如类代码行数中位数变化量Δ=-15.25)。
English
Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape. These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks. Agents have become active participants in refactoring, a cornerstone of sustainable software development aimed at improving internal code quality without altering observable behavior. Despite their increasing adoption, there is a critical lack of empirical understanding regarding how agentic refactoring is utilized in practice, how it compares to human-driven refactoring, and what impact it has on code quality. To address this empirical gap, we present a large-scale study of AI agent-generated refactorings in real-world open-source Java projects, analyzing 15,451 refactoring instances across 12,256 pull requests and 14,988 commits derived from the AIDev dataset. Our empirical analysis shows that refactoring is a common and intentional activity in this development paradigm, with agents explicitly targeting refactoring in 26.1% of commits. Analysis of refactoring types reveals that agentic efforts are dominated by low-level, consistency-oriented edits, such as Change Variable Type (11.8%), Rename Parameter (10.4%), and Rename Variable (8.5%), reflecting a preference for localized improvements over the high-level design changes common in human refactoring. Additionally, the motivations behind agentic refactoring focus overwhelmingly on internal quality concerns, with maintainability (52.5%) and readability (28.1%). Furthermore, quantitative evaluation of code quality metrics shows that agentic refactoring yields small but statistically significant improvements in structural metrics, particularly for medium-level changes, reducing class size and complexity (e.g., Class LOC median Δ = -15.25).