ChatPaper.aiChatPaper

修復前優先測試:提升檢測質量

Repair-R1: Better Test Before Repair

July 30, 2025
作者: Haichuan Hu, Xiaochen Xie, Quanjun Zhang
cs.AI

摘要

自动程序修复(APR,Automated Program Repair)旨在自动定位程序缺陷、生成补丁并验证修复效果。现有的APR技术常与大型语言模型(LLMs,Large Language Models)结合,利用LLMs的代码相关知识提升修复效率。当前基于LLM的APR方法通常在推理阶段才使用测试用例,采用先修复后通过测试执行验证的迭代方式。这一传统模式忽视了两个重要方面:测试用例在训练阶段的潜在贡献,以及修复前利用测试的可能性。为此,我们提出了Repair-R1,将测试用例引入模型训练阶段,并将测试生成提前至修复之前。模型需首先生成能够区分缺陷行为的判别性测试用例,随后基于这些测试进行修复。这使得模型能更好地定位缺陷并理解缺陷的根本原因,从而提升修复效果。我们以三种不同的骨干模型实现Repair-R1,采用强化学习(RL,Reinforcement Learning)共同优化测试生成与缺陷修复。在四个广泛采用的基准测试上的实验结果表明了Repair-R1的优越性。特别是,与基础模型相比,Repair-R1将修复成功率提高了2.68%至48.29%,测试生成成功率提高了16.38%至53.28%,测试覆盖率提高了0.78%至53.96%。我们已将代码和权重发布于https://github.com/Tomsawyerhu/APR-RL 和 https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step。
English
APR (Automated Program Repair) aims to automatically locate program defects, generate patches and validate the repairs. Existing techniques for APR are often combined with LLMs (Large Language Models), which leverages the code-related knowledge of LLMs to improve repair effectiveness. Current LLM-based APR methods typically utilize test cases only during the inference stage, adopting an iterative approach that performs repair first and validates it through test execution afterward. This conventional paradigm neglects two important aspects: the potential contribution of test cases in the training phase, and the possibility of leveraging testing prior to repair. To address this, we propose Repair-R1, which introduces test cases into the model's training phase and shifts test generation to precede repair. The model is required to first generate discriminative test cases that can distinguish defective behaviors, and then perform repair based on these tests. This enables the model to better locate defects and understand the underlying causes of defects, thereby improving repair effectiveness. We implement Repair-R1 with three different backbone models, using RL (reinforcement learning) to co-optimize test generation and bug repair. Experimental results on four widely adopted benchmarks demonstrate the superiority of Repair-R1. Specially, compared to vanilla models, Repair-R1 improves repair success rate by 2.68\% to 48.29\%, test generation success rate by 16.38\% to 53.28\%, and test coverage by 0.78\% to 53.96\%. We publish the code and weights at https://github.com/Tomsawyerhu/APR-RL and https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step.
PDF72July 31, 2025